DOCUMENT RESUME 



ED 300 414 



TM 012 410 



INSTITUTION 
PUB DATE 
NOTE 
PUB TYPE 



TITLE 



Assessment Handbook; A Guide for Assessing Illinois' 
Students. 

Illinois State Board of Education, Springfield. 
88 

106p. 

Guides - Non-Classroom Use (055) 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



HF01/PC05 Plus Postage. 

^Achievement Tests; Curriculum Development; 
Educational Assessment; Elementary Secondary 
Education; Evaluation Methods; Evaluation 
Utilization; Recordkeeping; Resource Hater:'.als; 
^School Districts; ^Standardized Te^-ts; *State 
Programs; *Student Evaluation; Test .*tems 
* Illinois 



ABSTRACT 



This mamual, which is provided as a resource for 



Illinois School Districts and which incorporatas some of the 
materials in the Alaska Department of Education's Assessment 
Handbook, is designed to meet the requirements of Public Act 84-126 
and to help practitioners develop student assessment plans of local 
usefulness. Areas covered in the guidelines include the nature of 
assessment and a comprehensive assessment system, involvement of 
staff and other constituent groups, types of assessment, alignment of 
assessment with curriculum and instruction, selection of standardized 
achievement tests, construction of local assessment procedures, 
administration of assessment programs, keeping records, and reporting 
assessment results. Assessment planning roles for teachers, 
administrators, curriculum coordinators, school board members, and 
parents are outlined. Other important considerations addressed 
include maximizing validity and reliability, avoiding bias, using 
multiple approaches, assessing different cognitive levels, using 
assessment procedures for multiple purposes, and developing an 
assessment schedule. Appendices include a glossary of assessment 
terms, a list of contact persons for proposals for funding for 
development and dissemination of effective assessment practices, and 
guidelines for improving test questions. (TJH) 



* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. * 

ftftftftftftftftftft*ftftftft*ftftftftftftftftftftftftftftftftftft«:^ftftftftftftftftftftftftftftftftftftftftftftftftftftftftftftftftftftft 



U.8. DEPARTMENT OF EDUCATION 
Ofhca ol Evocation*! Research and Improvement 
EDUCAT!ONAL RESOURCES INFORMATION 

> CENTER (ERIQ 

om^JS document has been reproduced as 
received from the person or organjiafion 
originating it 
O Minor changes have been made to Improve 
reproduction quality, 

• Po<nts of view or opinions stated In this docu- 
ment do not necessanfy represent otficiai 
OERI position or poJic/. 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



ASSESSMENT 
HANDBOO; 





Illinois' Students 







i 




1 










- ■ ' ■ '•}. .... 



AN ASSESSMENT HANDBOOK 
FOR ILLINOIS SCHOOLS 



Illinois State Board of Education 

1988 



©Copyright 1988, Illinois State Board of Education. 



(Printed by Auttwrity of the State of Illinois) 
{65483-5M— 4-88) 



ERIC 



4 



Foreword 



This manual is provided as a resource for Illinois school districts. The 
manual is three-hole punched so that it can be kept in a loose-leaf note- 
book and used as needed for local planning and development. Individu- 
al sections can be removed and duplicated. Additional materials can 
be inserted. In the future, we will issue additional chapters as needed. 

We appreciate the graciousness of the Alaska Department of Educa- 
tion in allowing us to adapt and use materials from their Assessment 
Handbook Those materials, which are acknowledged in individual 
chapters, have been of great assistance in ths preparation of this 
manual. 

If you have questions about the information in this manual, contact 
the Student Assessment Section at 217/782-4823. 



Ted SandersX 

State Superintendent of Education 
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This handbook is being distributed to help Illinois 
school districts meet the requirements of P.A. 
84 12G, while at the same time developing student 
assessment plans of high quality that produce 
information which is useful locally. Assessment is a 
critical element of the objectives-assessment- 
school improvement cycle. This handbook includes 
information for districts to use in developing 
effective assessment systems. 

What Is Assessment?* 

The terms test, assessment, and evah -^tion are 
frequently used interchangeably, but, in . ct, have 
important differences. 

Test, the narrowest of the terms, usually refers to a 
specific set of quebtion& that will be administeied to 
an individual or to all members of a group It is 
tangible and structured and can be administered 
within a relatively limited period of time. 

Assessment is more encompassing. Testing is part 
of assessment, but it is only one measurement 
approach. Assessment may also include other 
procedures such as rati'g scales, observation of 
student performance, individual interviews, or 
reviews of a student's background or previous 
performance. Assessment may refer to groups or 
individuals. Group assessment may involve 
administering subsets of items to different sampler 
of students and reporting the results for groups but 
not individuals. In addition, assessment often 
refers to a planned program of assessment. This 
handbook is about both assessment and testing. 

Evaluation, as the word itself suggest i, refers to 
makiPo a value judgment a^out the implications of 
as^wcisment data. This process is necessary for 
school improvement planning. While assessment 
involves obtaining performance data through a 
variety of means, evaluation goes a step further — 
interpreting the data from an informed perspective. 
That perspective should be informed by other 
information as well- for example, information 
about instructional content, community context, 
school climate, and dropout rate This handbook 
includes some material on the Irterpretation of 
assessment results, but, for the most part, 
evaluation is not included here 



*Section adapted from Alaska Assessment Handbook. 
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In summary, testing provides one isolated glimpse — 
analogous to taking a picture with a camera— of 
how a student or group of students is performing on 
specific skills at a specific time. Assessment 
provides more comprehensive data on student 
performance through several administrations of 
test batteries or through various other data- 
gathering approaches. Evaluation produces value 
judgments about the results provided through 
assessment. 

A word of caution. Testing, assessment, and 
evaluation are strongly interdependent, the quality 
of one affects the quality of the others. Good tests, 
comprising sound items based on curriculum- 
related objectives, strengthen assessment, and well- 
planned assessment, in turn, increases the 
probability of accurate evaluation by providing 
sufficient and valid data. 



How Does a District Develop an 
Assessment System? 

The process that a district useb to plan its assess- 
ment program is critical to the program's success. 
Naturally, that process helps determine the quality 
of the sycjtem and its appropriateness for local pur- 
poses. Another very important potential influence 
of the development process is that it elicits crucial 
support from people who will approve the system, al- 
locate resources to it, and implement it. Without 
that support, the system may not have a chance. 
One strategy for generating that support— involv- 
ing constituent groups in the program's develop- 
ment — is the topic of the next chapter. 



What Is a Comprehensive 
Assessment System? 

A comprehensive assessment system is a coordinat- 
ed plan for periodically monitoring the progress of a 
district's or school's students at multiple grade 
levels in a variety of subject areas. It specifies the 
procedures that will be used for assessment; indi- 
cates when and how those procedures will be admin- 
istered; and describes plans for processing, inter- 
preting, and using the resulting information. A 
good comprehensive assessment system includes: 

• A schedule for assessing students throughout 
the school year and at all grade levels. 

• Multiple types of assessment procedures (e.g., 
norm-referenced tests, criterion-referenced tests, 
locally developed performance rating scales) 
that are used appropriately. 

• Standards that protect the quality of student 
assessment. At a minimum, those standards 
refer to the reliability of tho test/other assess- 
ment procedure administration and the validity 
of the interpretation and use of resulting data 
(AERA, APA, and NCME, 1985). 

• Provisions for collecting other relevant infor- 
mation (for example, analysis of the instruc- 
tional process, judgments about locai condi- 
tions/needs, and contextual/system variables 
such as mobility and dropout rates) that can be 
used to supplement achievement data during 
decision making. 

• Plans for processing and using the assessment 
results. Such plans can help districts design as- 
sessment systems that meet information needs, 
are efficient, and minimize the paperwork 
burden on staff. 
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Different Groups— Different 
Roles 

To play a significant role in education, an assess- 
ment program must have the support of both the 
taxpaying public which funds it and the educators 
and administrators who work with it. The program 
must appear credible — that is, valid and useful — to 
teachers, administrators, parents, students, and 
others. District administrators may develop that 
credibility by involving these groups in decision 
making about assessment. This chapter of the As 
sessment Handbook discusses the groups that might 
be involved and their role in planning, implement 
ing, maintaining, and evaluating assessment 
programs. 

To the extent possible, those who are affected by as- 
sessment results (district and school administra- 
tors, teachers, students, parents, and other com- 
munity representatives) should participate in the 
assessment's design. Each group offers a unique 
and vital perspective on which skills are most im- 
portant to assess, how to assess them, and especially 
how to use the results. 

Several issues for discussion among these assess- 
ment decision makers include: 

„ What should be the major purposes/func 
tions of assessment? 

2. What types of assessment procedures are 
most appropriate for various goals/objec- 
tives? 

3. When should assessment occur (both time 
of year and frequency)? 

4. How should results be reported? What iata 
do various groups want and need? 

5. How should assessment results be used? 

6. What do results indicate about student per- 
formance? 



*Chapter adapted from Alaska Assessment Handbook 
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7. What revisions are needed in local objec- 
tives, student expectations, teaching-learn- 
ing activities, or assessment procedurei*? 

8. How can the assessment system be im- 
proved (for example, by modifying the as- 
sessment approaches, processing proce- 
dures, or reporting) ? 

9. How can the assessment program better 
serve its users? 

Not all groups need to participate in discussions 
about all questions. Different groups contribute dif- 
ferent strengths, and these strengths should in- 
fluence how various constituent groups are in- 
volved. The chart in the Chapter Appendix shows 
how the different responsibilities might be dis- 
tributed across groups. 



Committees Serve a Variety of 
Needs 

Depending on a district's particular needs, several 
different types of advisory groups should partici- 
pate in long range assessment plann. These 
could include an overall committee, task lorces, and 
an interpretive panel. The cumpobition and f unctiun 
of each group is discussed in the following 
paragraphs. 

Overall Committee 

This committee will have general responsibility for 
developing the assessment system. It should be 
broadly represent .cive of the district. The commit* 
tee should probably include one or two district ad- 
ministrators who will be involved in system imple- 
mentation or the use of resulting data (e.g., testing 
or curriculum directors), one or two building- level 
administrators, a number of teachers from different 
grade levels and subject areas, and perhaps teachers 
association or union repressntatives and communi- 
ty (e.g., school board) Members. 

This committee c&-i form an important link between 
local teachers and administrators and, depending 
on loc^l needs, the link may be broadened to include 
others. Throughout the planning process, the com- 
mittee should inform others of its work and invi^^ 
their suggestions. The committee will probably 
W£,nt to involve other staff more directly by estab- 
'ishinf^ir-r^cial-purpose task foir*es. 



One of the committee's first tasks will be to develop 
a framework for the assessment system. That 
fram..work should include the functions or purposes 
of the system and the types of assessment proce- 
dures to be used. (Several functions or purposes vh^ 
committees may want to consider are listed in the 
Chapter Appendix. Types of assessment pDcedures 
are discussed in Chapter 3.) The framework can 
then be used— by the committee as a who- t jv by 
task forces or subcommittees — to guide the develop* 
mentof more specific assessment plans. 

Task Forces 

The number, composition and major functions of 
the task forces will vary according to local factors 
such as development needs, district size, and type of 
district (elementary, secondary, or unit). The func- 
tions of the task forces may include selecting or de- 
veloping assessment procedures for various learn- 
ing areas/grade levels/goals or designing specific as- 
sessment system elements such as procedures for 
processing or reporting the assessment data. 

Interpretive Panel 

An .terpretive panel can improve the use of assess 
mv > results by reviewing them and suggesting 
what the results reveal about student performance. 
The interpretive panel can examine the numerous 
complex factors affecting student performance and 
help audiences understand the relationship among 
curriculum, instruction, and assessment. The/ can 
provide valuable guidance about school improve- 
ment. 

The School Board's Role 

A well-informed school board can be one of the best 
allies of any assessment program. Proactive admin- 
istrators keep their boards closely involved 
throughout the planning and implementation of as- 
sessment programs. Regularly scheduled board 
review can help ensure that testing practices 
remain responsive to a district's changing needs. 

At each stage of the planning, implementation and 
review of an assessment program, a skillful admin- 
istrator provides the board with relevant decision- 
making information. The information is more likely 
to be used if it is timely, complete, easily under- 
stood, and targeted to the decisions at hand. 
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During the initial planning stages of an assessment 
program, the board should be educated about: 

• the potential purposes of assessment, 

• the limitations of assessment, 

• state assessment requirements, 

• components of an effective assessment program, 

• estimated costs for various assessment options, 
and 

• proposed procedures for test selection or 
development. 

After an assessment program has been implement- 
ed, the board should be kept informed about it. 
When test results are available, the board should re- 
ceive a report of district performance. This report, 
delivered prior to public reporting of the scores, 
should be designed to help members understand the 
results and to prepare them for questions/comments 
from the public. 

The assessment program should be reviewed 
regularly— whenever students do not meet locally 
deflned expectations in one or more learning areas 
(as indicated in School Improvement Plans) or ac- 
cording to a pre-established schedule. The review 
should include a staff analysis of the appropriate 
ness and usefulness of the assessment procedures. 
Also, various audiences can be surveyed about their 
perceptions of the program's effectiveness. Staff 
recommendations for changes in the assessment 
program can be presented to the board, along with 
survey results. If the board members have been 
provided with appropriate information since the 
beginning of the assessment program, decisions 
made at this point should be especially sound. 



Practical Tip 

Why does assessment generate so much 
controversy? One important reason is that 
people (including teachers) feel they were not 
part of the process, that decisions were made 
without their knowledge, leaving them with only 
a "take it or leave it option. Some fear that inap- 
propriate assefssment procedures will produce 
misleading n^sults that make them look bad. 
When all parties are involved from the begin- 
ning, assessment programs should be better 
and the amount of criticism reduced. 



Special Considerations 

Members of all committees, task forces, and panels 
should be selected carefully. Whether they are 
asked to serve or selected from a pool of volunteers, 
they should be people who are: 

• interested in the development (or interpretive) 
task, 

• know kdgeable about the educational program, 

• supported and respected by district teachers 
and administrators, and 

• able to devote time to the task. 

Developing a comprehensive assessment system is 
time-:onsuming. However, it is a very important 
task which must be done with care. District admin 
istrators may need to identify strategies that make 
it easier for school personnel to participate. For 
example, they might schedule meetings during in- 
service or institute days, hire substitutes, ask other 
teachers to cover participants' classes (and perhaps 
reward them for doing so), or pay staff for working 
during the summer. 



References 
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Woodstock Community Unit District #200. (1986). Student Assessment Sybtem, (Available at Educational 
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Zion Elementary School District # 6, (1986). An Aligned Instructional GuulAsstSbmtrU Management Sybtem for 
Student Learner Outcomes, (Available at Educational Service Centers.) 
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Appendix A: Chapter 2 
Suggested Roles of Constituent Groups in Assessment Planning 



GROUP 



Determination 
of Assessment 
Functions/ 
Purposes 



Selection or 
Development of 
Assessment 
Procedures 



Implementation 
of Assessment 
System 



Interpretation 
of Assessment 
Results 



Evaluation 
of Assessment 
System 



TEACHERS 



Propose assessment 
purposes and 
priorities 



Review standardized 
tests/write items/ 
develop other assess- 
ment procedures 



Administer 
assessment 



Help interpret 
results; suggest 
appropriate school 
improvements 



Review system and 
suggest modifica- 
tions 



ADMINISTRATORS 



Review proposed 
assessment pur- 
poses and priori- 
ties; develop 
proposal for 
school board 



Oversee assessment 
development 



Oversee assess- 
ment administra- 
tion and report- 
ing of results 



Review school and 
district results; 
develop local 
improvement plans 
and report results 



Review and modify 
assessment system 



CURRICULUM 
COORDINATOR/ 
OTHER INVOLVED 
STAFF 



Review proposed 
assessment pur 
poses 



Review standardized 
tests/participate 
in other asssessment 
development 



Assist with assess 
ment administra 
tion; handle 
processing and 
reporting 



Help interpret 
results and advise 
school improve- 
ment decisions 



Review system and 
suggest modifica- 
tions 



SCHOOL BOARD 
MEMBERS 



Critique proposed 
assessment priori- 
ties 



Approve proposed 
general assessment 
procedures 



Review assessment 
results and 
improvement plans 



Review system and 
approve suggested 
modifications 



H 
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PARENTS/OTHER 

COMMUNITY 

MEMBERS 



Suggest assessment 
priorities 



Review assessment 
results 



Review system and 
suggest modifica- 
tions 



Appendix B: Chapter 2 
Assessment Purposes/Functions 



Early in the process of developing an assessment 
system, local planners need to decide what func- 
tions the system should serve, a decision which 
directs the remainder of the process and provides a 
basis for formal board policy statements regarding 
assessment. Some assessment functions/purposes 
that may be considered include: 

o To meet the requirements of P.A. 84-126 (as- 
sessing third, sixth, eighth, and eleventh grade 
students on local learning objectives in the six 
fundamental learning areas annually). 

• To help assure that students meet local objec- 
tives by monitoring their progress more 
frequently. 

o To meet accountability expectations by collect- 
ing achievement data that permit comparisons 
of local students with national norms. 

• To make better school improvement decisions. 

• To monitor student progress on classroom level 
instructional objectives. 



• To diagnose student needs. 

• To place students in special classes or instruc- 
tional groups. 

• To provide information for career or psychologi 
cal guidance. 

• To obtain college entrance examination data. 

• To evaluate special programs. 

Decision makers should consider these and other 
potential assessment functions carefully. They may 
want to limit the purposes of a districtwide assess- 
ment system, for example, to focus on the first four 
functions listed above and leave the others to the 
discretion of teachers, guidance counselors, or other 
local personnel. However, they may also want to 
consider other functions in order to minimize the 
number of assessment procedures that must be ad 
ministered by selecting those which can serve 
multiple functions— for example, a norm-referenced 
test that can help assess local objectives and meet 
accountability expectations and Chapter 1 report- 
ing requirements. 



5 

16 



Chapter 3: T^pes of Assessment 






Chapters 



M Pubiishei^s Standardized 
ShelfTests 

B PjiblisHer*sOwtomi2e 

a Publisher's Textbook Tests 

■ District's locally Developed 

« Und^orm^l^ 
Heqairing WH%h 
Perfom Students 
ScprM to a 

iUs^ormJ^i^ 

a Unifdm 

^dSJonWri^nX^^ 
: by Studente^ 



After identifying the functions of assessment, com- 
mittees should decide what general types of assess- 
ment procedures are most appropriate for each. 
Later, task forces can select or develop more specific 
assessment plans. This chapter describes various 
types of assessment procedures; the uses, strengths, 
and weaknesses of each, and the special considera- 
tions they require (summarized in Table 3.1). The 
assessment procedures discussed here correspond 
with those on the Learning Assessment Plan (LAP) 
form (ISBE 41-78): 

Publisher's standardized shelf tests, 

Publisher's customized tests. 

Publisher's textbook tests, 

District's locally developed tests. 

Uniform procedures requiring written perfor- 
mance by students scored according to a uni- 
form rating scale. 

Uniform procedures requiring other (nonwritten) 
performance by students scored according to a 
uniform rating scale. 

The first four procedures refer to what is commonly 
known as forced-choice testing approaches. That is, 
students take pape^ and pencil tests and select the 
correct response from two or more alternatives 
(e.g., multiple-choice, true-false, or matching test 
items) or supply a word or short phrase to answer a 
question or complete a statement. Students' scores 
are likely to be comparable regardless of which 
teachers administer and score the test. The last two 
types require student performance. Here, uniformi- 
ty is emphasized in order to reduce the variance 
which normally occurs when teachers judge student 
performance, but do not use the same criteria or 
standards. Without uniformity, the scores of stu- 
dents with different teachers cannot be summarized 
together or compared. Due to the need for district- 
wide student assessment data, all teachers in a dis- 
trict must use the same instructions to elicit stu- 
dent peformance and the same criteria and stan- 
dards for rating the performance. 
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Publisher's Standardized Shelf 
Tests 

Most tests that districts know as standardized 
achievement tests are publisher's standardized 
shelf tests. (Examples are listed in the instructions 
for the LAP forms.) Published norm-referenced and 
criterion-referenced tests are both in this 
classification. 

Publisher's standardized shelf tests can ser\e mdn> 
different purposes. Sometimes selected items or 
subtests can be used to assess student achievement 
of local objectives. Generally, data from these tests 
will help meet accountability needs, inform school 
improvement, and meet reporting requirements for 
special programs. Sometimes these tests can also 
provide information for diagnosing student needs 
or making student placement decisions. 

Local planners who usb published tests to assess 
local objectives may decide to select individual 
items, eventually clustering the items to measure 
particular objectives or sets of objectives. Or, they 
may decide to adopt particular subtests. 
Occasionally, an entire published test may be 
appropriate for assessing local objectives. 

A major advantage of publisher's standardized 
shelf tests is their multiple potential uses, but the 
tests have several other advantages also. They are 
likely to have been professionally constructed. 
They have probably been reviewed carefully, 
pilot-tested, and analyzed for bias. Also, informa- 
tion about reliability and validity are usually 
available in technical manuals and other sources. 
Item-difficulty data, which can be very useful in 
deciding how to use an item and in setting student 
expectations, may also be readily available. Scoring 
and reporting services can be obtained. 

These tests must be used carefully, however. Local 
staff should screen the items closely to determine 
their appropriateness for measuring student 
achievement of local objectives. Test publishers 
may provide information to help with this task (for 
example, lists of items which assess particular 
objectives). However, teachers should still examine 
the it^ms to find out whether they agree with the 
test publisher's interpretation of the objective and 
of what the items measure. And, local planning 
groups need to make sure that the test results can 
be aggregated as needed (that the publisher can 
provide student performance data for the specific 
clusters of items that measure particular local 
objectives) and that the district orders a scoring 
package which will include that information. 



Publisher's Customized Tests 

Customized tests are tests which publishers tailor 
to local Aieeds. The publishers often maintain banks 
of objectives and test item& to measure the 
objectives. A district selects objectives and the 
publisher identifies relevant itemb and assembles 
them into a test. 

Customized tests can be particularly useful for 
assessing local learning objectives. They have other 
advantages also, many of which are similar to those 
of standardized shelf tests. Generally, the items 
have been professionally developed. Data may be 
available about validity, reliability, and item 
difficulty. Scoring and reporting services are 
available. 

However, districts should be cautious about using 
this assessment option. The tests (and related 
scoring and reporting services) can be very 
expensive. The objectives that are selected fiom the 
publisher's bank may not be sufficiently aligned 
with local objectives. Local personnel should 
examine individual items and consider their 
appropriateness for measuring specific local 
objectives, as well as the level of knowledge 
assessed. Like other standardized published tests, 
customized tests are generally limited to 
forced-choice (e.g., multiple-choice or true-false) 
items that can be readily scored by machine. They 
may not be valid for assessing some skills in areas 
such as writing, speaking, fine arts, physical 
development. 

Publisher's Textbook Tests 

Textbook tests are the tests that accompany 
textbooks. They may be particularly appropriate for 
assessing student progress on local objectives, 
including classroom level instructional objectives. 
They have several advantages: They are readily 
available, usually, closely aligned \vith instruction; 
and inexpensive. Districts do not have to spend 
additional money to purchase them or time to 
develop them. 

Before deciding to use textbook tests to measure 
local objectives, local planners should take several 
factors into con^sideration. The test items should be 
aligned with local objectives. (Since districts will 
report information about student achievement of 
objectives, aligning tests only with instruc- 
tion— i.e., use of the textbook— may not be 
sufficient.) Districts may need to ask textbook 
publishers for additional information about test 
items. For example, what is known about the items' 
reliability, validity, and difficulty level? Have the 
items been pilot-tested or reviewed for bias? 
Districts may also need to establish procedures for 
duplicating the items and scoring them uniformly. 
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District's Locally Developed Tests 

An attractive alternative, locally developed tests 
may be the most appropriate type of procedure for 
assessing many local objectives. Since locally devel- 
oped tests will be administered on a smaller scale 
than publishers' tests, they may permit more oppor- 
tunities for such alternatives as having students 
listen to audio recordings as they respond to 
multiple-choice questions about music, using video- 
tapes with questions about dance or drama, or 
simulating scientific experiments on computers. 
Also, some local audiences may consider these tests 
more credible and use the results more. 

Developing local tests is a very ambitious 
undertaking, however. The process is very 
time-consuming. Writing good assessment items is 
difficult. Districts should develop several items for 
each local objective. Local staff must address issues 
such as reliability, validity, nondiscrimination and 
item difficulty. They will also need to develop 
processes for printing, scoring, and reporting. 

Uniform Procedures Requiring 
Written Performance by 
Students Scored According to a 
Uniform Fating Scale 

Studencs' writing skills may be tested by an 
assessment procedure in which students are given a 
standardized writing assignment and trained 
raters assess the results using a common scale. 
(The Illinois Writing Assessment Program 
illustrates this approach. It has been used widely 
and tested extensively. Standardized prompts and 
uniform rating scales exist. See Write on, Illinois!, 
ISBE, 1987.) This type of assessment may represent 
the most effective procedure for assessing students' 
writing skills. Furthermore, it can be useful 
instructionally. 

The procedure is somewhat demanding and 
time consuming. Local planners will need to obtain 
or develop writing prompts and to ensure that 
raters are trained. (Fortunately, resources and 
technical assistance are available in Illinois 
through the Educational Service Centers.) The 
rating process should be monitored periodically to 
determine that it is being applied uniformly. 



Uniform Procedures Requiring 
Other (Nonwritten) Performance 
by Students Scored According to 
a Uniform Rating Scale 

In performance assessment procedures, students 
demonstrate their achievement in a particulai area 
by performing a specified task. For example, they 
might draw a sketch, play a musical instrument, or 
run a 100-yard dash. The specific assignment 
should be described in the local assessment proce- 
dures, as well as instructions for rating the perform 
ance uniformly. (As with all locally developed 
assessment procedures, districts should maintain 
centrally located files in which performance 
stimuli/assignments and uniform rating scales are 
described thoroughly.) 

The performance might be live or recorded. For 
example, a student might perform a dance or play, 
conduct a laboratory experiment, or compete in an 
athletic contest as a teacher rated his or her 
performance. Or, the performance might be 
videotaped so that the teacher could view it later. 
Other forms of recorded performance might include 
portfolios of sketches, photographs, or choreogra- 
phy; audio recordings of speeches or musical 
performance; and works of art such as paintings or 
sculptures. 

A major advantage of this type of procedure is its 
effectiveness in assessing some objecti\es, 
especially in the fine arts and ph>sical development 
and health. Many objectives that refer to student 
performance cannot be assessed validly using 
paper-and-pencil tests. 

Performance assessment procedures must, of 
course, be used uniformly. Standardized directions 
for eliciting student performance and scales for 
rating that performance must be developed. 
Teachers must be trained in their use. Without 
uniform administration and scoring practices, the 
results will not be useful to anyone other than 
individual teachers. The aggregation of data across 
students or schools — which, for example, is 
required for assessing local status on learning 
objectives — is not justified. The additional effort 
this requires, however, may be far outweighed by 
the meaningfulness and utility of the results. 
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Table 3*1 

Advantages of and Cautions Regarding 
Vai'ious lypes of Assessment Procedures 



Type of Procedure 



Advantages 



Cautions 



Publisher's standardized shelf 
tests 



Publisher's customized tests 



Publisher's textbook tests 



District locally developed tests 



Uniform procedures requiring 
written performance by stu- 
dents scored according to a 
uniform rating scale 



Uniform procedures requiring 
other (nonwritten) perfor- 
mance by students scored ac- 
cording to a uniform rating 
scale 



Useful for many purposes. 

Professionally constructed, 
piloted, and reviewed for bias. 

Validity, reliability and item- 
difficulty data probably 
available. 

Scoring and reporting services 
available. 



Should be closely aligned to 

local objectives. 
Professionally constructed, 

pilot-tested, and reviewed for 

bias. 

Validity, reliability and item- 
difficulty data may be 
available. 

Scoring and reporting services 
available. 

Probably clo^sely aligned to 

instruction. 
Readily available. 
Inexpensive. 



Especially appropriate for as 
sessing local objectives. 

Local credibility may increase 
use. 

Small scale allows flexibility 
beyond paper-and-pencil mea- 
sures. 



Particularly ^^ffective for as- 
sessing wribiiig skills. 

Results are useful 
instructionally. 

Technical assistance available 
through ESCs. 

Especiall y appropriate for as 
sessing some object! 'es. 
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Require careful examination 

of alignment with local 

objectives. 
Number of items that assess 

individual objectives may be 

insufficient. 
Require attention to ensure 

that test results are ag- 

gi'egated and displayed 

appropriately. 

Require examination of appro- 
priateness and quantity of 
items that assess local 
objectives. 

Test construction, scoring, and 
reporting services may be 
very expensive. 

May be limited to paper- 
and-pencil multiple-choice 
items. 

Require examination of align 
ment with local objectives. 

Information about validity, 
reliability, and item bias not 
readily available. 

Procedures for printing, scor- 
ing and reporting uniformly 
may have to be developed 
locally. 

Difficult, time-consuming 

process. 
Validity, reliability and item 

bias must be examined 

locally. 

Procedures for printing, scor- 
ing, and reporting must be de- 
veloped locally. 

Raters must be trained. 
Rating process is 
time-consuming. 



Difficult to develop, standard 
ized prompts must be written, 
uniform rating scales devel- 
oped, and raters trained. 

Validity, reliability and bias 
may have to be examined 
locally. 
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V' Steps towarcI'M^rpving 
AUgnmen^ j / 

i 'Special Issues to B^'I^sbiyed 

•a A Summary .S^^ 



Alignment: A Definition 

Alignment simply means matching. In educational 
context, it has numerous applications and may 
refer to a match between: 

• local objectives and assessment, 

• local objectives and teaching-learning activities, 
« teaching-learning activities and assessment, 

• state goals and local objectives, or 

« objectives at one grade or school level and objec- 
tives at another level. 

In the broadest sense, alignment means coordina- 
tion among all the elements of objectives/ curricu- 
lum, instruction, and assessment. A well- 
functioning alignment provides smooth, f'yclical 
transitions from planning to instruction to assess- 
ment, remediation, and enrichment to evaluation 
and then back to planning. Everything works 
together. The elements are coordinated between ele- 
mentary and secondary schools, as well as between 
grade levels in each. 

Many factors affect the extent of alignment within 
a district, but successful alignment is most evident 
in districts where: 

• communication among teachers and admin- 
istrators at all levels is open and functional, 
that is, communication channels are purpose- 
fully used to support or increase alignment; 

• educational goals and objectives are written, 
coordinated across grade levels, and well known 
to, and supported by, teachers and administra- 
tors at all levels. 



*Chapter adapted from Alaska Assessment Handbook. 



ERLC 



1 

51 



4 



Of course, just because a district has goals and ob- 
jeiJtives in writing does not necessarily mean that 
teachers are teaching those things, nor that tests 
are measuring what teachers are teaching. Admin- 
istrators and teachers must monitor alignment to 
ensure that it is occurring. 

The primary interest here is the integration of as- 
sessment procedures wita objectives and instruc- 
tion. Because these three areas are so intertwined, 
this chapter discusses the overall issue of 
alignment. 

Steps toward Improving 
Alignment 

Once a district has decided to improve alignment, 
how does it proceed? District staff must recognize 
that alignment is a dynamic process involving 
many continuously changing factors. A common 
sense approach is to think of alignment as a con- 
tinuum. It can always be improved and works 
better in some circumstances than others. 

Given that basic uderstanding, local staff may 
follow these steps toward improving alignment: 

1. Review state goals and district learning objec- 
tives, looking for consistency among grade 
levels, an ordered progression from one level to 
the next, and appropriate tasks for each [ ade 
level. 

2. Write/review/revise objectives to n.atch what 
the goals indicate will be achieved. 

3. Analyze the objectives to determine precisely 
what should be introduced, reinforced, 
reviewed, or mastered at each grade level. 

4. Ensure that all tasks identified through Step 3 
can be covered by instruction and that perfor- 
mance on these tasks can be assessed through 
testing or other viable procedures (e.g., 
classroom observation). Consider writing some 
sample test items to identify objectives that are 
too broad. 

5. Devise a plan for the selection of textbooks and 
other materials to support knowledge and skill 
development in specified areas. 

6. Outline a long-range plan for assessment that 
satisfies the district's specific needs. 



Practical Tip 

Alignment is most beneficial when all 
elementary and secondary grade levels are 
involved. That way, programs and expectations 
of students are more coordinated and consistent 
from level to level Alignment should involve 
teachers from all elementary and secondary 
schools that students are scheduled to 
attend— even if that means working with 
teachers from another district 



7. Establish procedures for test selection and/ or 
assessment development. The procedures 
should emphasize criteria that foste^ alignment 
(i.e., a match betv/een test content, objectives, 
and teaching-learning- activities). Assign re- 
sponsibilities for assessment selection, develop- 
ment, and review. 

8. Review current instructional plans (including 
Learning Assessment Plans) and assessment 
data to identify which goals and objectives are 
being addressed well, poorly, or not at all. 

9. Recommend new instructional strategies that 
support alignment (e.g., integrating science and 
math instruction to develop students* math 
skills through science problems such as farm 
management). 

10. Survey teachers to determine staff development 
needs. Do teachers need more information and 
skills related to assessment? Teaching across 
learning areas? Writing measurable objectives? 
Designing instruction for selected objectives? 

11. Design staff development based on the findings 
of Step 10. 

12. Provide forums for representatives of various 
groups (teachers, administrators, and communi- 
ty members) to share their perceptions about 
objectives, instruction, and assessment at each 
grade level. 

13. Review local expectations for student achieve- 
ment. Are they bull appropriate after assess- 
ment data are reviewed and objectives or in- 
structions are revised? Or should the expecta- 
tions be raised or lowered? 
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14. Ensure that the assessment program adequate 
ly measures the achievement of expectations, 
ensure 'hat the instructional program gives stu 
dents the skills and knowledge needed to meet 
expectations. 

15. Review local remedial and enrichment pro- 
grams. Are placement procedures consistent 
across the district? Are those procedures directly 
related to instructional and testing programs? 



Practical Tip 

Many districts wrestle with the issue of 
alignment In fact, chances arei^ood that a 
neighboring district has developed an alignment 
procedure that might be a good starting point If 
so, they may have already tried the procedures, 
know what works and what doesn *t Other 
districts can adopt the good and correct 
the bad-^with much less effort than 
starting from scratch, 



16. Make alignment an inherent part of future 
planning and of high priority. Consider all per- 
spectives (those of teachers, district curriculum 
coordinators, and other content or measurement 
specialists). 

Districts cannot expect to go through these steps 
for all learning areas in a year. But they should de- 
velop plans for systematically and strategically 
improving this alignment of curriculum, instruc- 
tion, and assessment even if it takes several years. 
The advantages for students and educators alike 
are well worth the effort. 

The appendix includes a checklist which can be 
used in several alternative ways: 

—as a survey to give to teachers and others early 
in the alignment process to help identify local 
needs, 

—as a worksheet for the local planning group to 
use in examining the current status of align- 
ment locally, and/or 

—as a survey or worksheet to use after several 
years have passed to help determine what has 
been accomplished and to identify current needs. 



Districts may want to adapt the checklist to fit 
local conditions and to consider other types of align 
ment also. For example, they may want to examine 
the compatibUity between local educational philoso 
phy and objectives/instruction, objectives and in- 
structional processes, or assessment purposes and 
procedures. 

Special Issues to Be Resolved 

Regardless of how successfully or smoothly a dis- 
trict may handle its alignment efforts, certain prob- 
lems can arise. Two common problems and possible 
solutions are described below. 

• Problem: Maintaining the autonomy of each 
level in the education svstem. What if elemen 
tary and secondary teachers see themselves as 
very different in philosophy or approach? Do 
secondary staff have the right to dictate what 
should be taught or emphasized? Consistency 
of curriculum across levels is critical, and when 
one level dictates to or directs another, the 
spirit of cooperation necessary for efficient 
alignment may be lost. 

So'ation: Provide a forum for discussion 
air: )ng representatives of all levels. No single 
level should take the lead in setting educational 
objectives or priorities. Educational objectives 
and curriculum should reflect the emphases 



th 



educators frcm all levels view as critical. 



• Problem: Ensuring that testing plays a realistic 
role. Since some important educational out- 
comes are not measured through tests, good 
alignment does not demand that everything in 
the curriculum be tested formally. Misunder- 
standing of this can lead to overtesting which, 
in turn, leads to other problems such as schedul- 
ing conflicts, student anxiety, and staff 
resistance. 

Solution: Alignment policies should not over- 
emphasize assessment; the match between ob- 
jectives and instruction is just as important as 
the match between objectives and assessment. 
Clearly state how objectives should be selected 
for assessment. Emphasize other valid ways of 
assessing students* competence, including care- 
ful classroom observation. 



Practical Tip 

Do top administrators understand and support 
alignment efforts ? A lot of work can be wasted if 
alignment efforts don^t have consistent support 
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A Summary of Advantages 

The potentia ' advantages of alignment may already 
be appaitent They include: 

• ImproN ed communication, 

• Better problem solving through coordinated 
effort, 

• Smoother transitiDn for students (between 
grades and between cchool levels). 



• Improved instruction (through consistency), 
and 

• Increased satisfaction and productivity as a 
result of a focused, consistent effort. 

These advantages suggest alignment's potential 
impact on students. When objectives, instruction 
. and assessment are aligned, a school's mission is 
more likely to be accomplished. Since, presumably, 
that mission is based on student needs, students are 
the prime beneficiaries of the alignment process. 
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Checklist for Determining Alignment 

Various criteria have been developed for gauging a district's alignment of curriculum, instruction, and as 
sessment. The following checklist is one approach to determining alignment. Use the checklist to judge >our 
district's progress toward improving curriculum alignment. Circle a number beside each statement to show 
how true it is in yom district, using the following scale: 

4 = Very true in the district 

3 = Somewhat true in the district 

2 =^ Mostly untrue in the district 

1 == Completely untrue in the district 

0 = Don't know whether it is true in the district 



Goals and Objectives 

1. Clear student learning objectives have been established 

and put in writing. • 0 12 3 4 

2. The objectives are coordinated across school and grade 
levels,* with an ordered progression from one level to the 

next 0 1 2 3 4 

3. The objectives match s''hat the State goals indicate will be 

achieved 0 1 2 3 4 

4. The district objectives are known and supported by teachers 

and administrators at all levels 0 12 3 4 

Instruction 

5. Teachers design their instruction to match the district's ob- 
jectives 0 1 2 3 4 

6. Textbooks and other materials support the skills and 

knowledge needed to meet the objectives 0 12 3 4 

7. Teachers are provided with inservice activities that support 

the alignment process 0 12 3 4 

8. The needs of special groups of students (bilingual, special 

education, gifted and talented, etc.) are addressed 0 1 2 3 4 

Assessment 

9. Assessment consistent with district objectives occurs on a 

regular basis 0 1 2 3 4 

10. Assessment is matched to course content and materials 0 12 3 4 

11. Assessment is matched to classroom instruction 0 1 2 3 4 

12. Assessment results are used to evaluate which goals and ob- 
jectives are being achieved and which are not 0 1 2 S 4 



*Even in non unit districts, the objectives are coordinated between elementar> and high school districts. 
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Communication 

13. Open and functional communication exists among educa- 
tors at all levels within the district* 0 1 2 3 4 

14. Assessment results are clearly communicated tot? ;e public 0 12 3 4 

15. Forums allow interested groups (e.g, parents and teachers) 

to share ideas about objectives, instruct! \ and assessment 0 1 2 3 4 

16. Teachers at one level or in one content area work closely 
with their colleagues at other levels and in other subject 

areas to achieve common educational goals 0 12 3 4 

When you have completed assigning 0 4 ratings to each of the criteria, go back and find all the criteria 
that have ratings of 0, 1 or 2. Circle the number of these ''troublesome" criteria below. 



1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 







Now it*8 time for an honest appraisal. Can >ou improve the situation associatea >^ith the problem criteria 
you circled above? Talk with others as you decide which of the problems can be eliavinai-cd and which will 
be present throughout the alignment effort. 

Next, circle the appropriate numbers for all those criteria that cannot be met, even with special effort. 



1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 







You may not have to circle an> numbers, but, more likel>, one or two Important components of the align- 
ment effort just won*t fall into place. You can learn to live with these shortcomings, but it is important 
for everyone associated with the alignment effort to know that these problems exist. 



*In non unit districts, open and functional communication a^so exists between elementar> and high school 
district educators. 
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Purposes of Standardized 
Norm-Referenced Tests 

For most school districts in Illinois— and in the 
United States— standardized norm-referenced tests 
(NRTs) are an important part of district assessment 
programs. These commercially available tests were 
prepared by measurement experts. When uniform 
procedures are used to administer and score the 
tests, local student performance ca^* be compared to 
national norm groups who were tested using the 
same instr'iment. 

This ability vO compare test results makes standard- 
ized tests more useful for some purposes than other 
types of test. In situations where students or stu- 
dent groups must be ranked (e.g., for some accounta 
bility and evaluation decisions or for selection of 
students for special programs), a well chosen stan- 
dardized NRT should be seriously considered. Stan- 
dardized tests are appropriate in accountability and 
evaluation when the question of interest is "How 
are our students doing compared to similar students 
throughout the nation?" When the question is "Are 
our students learning the skills and obtaining the 
knowledge we say we are teaching them?", stan- 
dardized tests are less appropriate. 

Most districts use standardized test batteries 
which cover a broad range of basic skills content, al- 
though standardized diagnostic and single-subject 
tests with associated norms are available. Many dis 
tricts, particularly larger diatricts, also use an apti 
tude test which was normed on the same sample as 
the publisher's achievement battery. This allows a 
two-way comparison of their students* perform 
ance— externally (compared to similar students na- 
tionwide) and internally (compared to their own ex- 
pected achievement as determined by the aptitude 
measure). In some districts, this two- fold compari- 
son is an efficient use of testing resources for 
gathering a maximum amount of information. 



*Chapter adapted from Alaska Assessment Handbook 
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To be valid 8*id beneficial, standardized achieve 
ment tests must measure the appropriate skills and 
knowledge that, generally, are described in local 
learning objectives. The previous chapter included 
suggestions for improving that alignment. Guide 
lines for selecting standardized tests that are most 
useful locally are provided in this chapter. 

Criteria for Selecting 
Standardized Tests 

District planning groups may use many different 
criteria to guide their selection of standardized 
tests. Those criteria are suggested and described 
here. The Chapter appendix includes worksheets 
that local groups can use, perhaps with adaptations 
to fit local conditions, in the selection process. The 
criteria are categorized into three types of selection 
considerations, alignment, technical and practical. 

The first worksheet "Content Validity of Standard- 
ized Tests" is for examining the alignment between 
local objectives and a particular standardized test, 
as discussed in the next section. Information from 
that worksheet will be used to complete tk3 first 
section of the second worksheet "Rating Sheet for 
Standardized Tests.*" 



Alignment Considerations 

Alignment criteria are critical in test selection. 
Standardized tests that are not closely aligned to 
local objectives should be eliminated from further 
consideration. 

One important question— which should be answered 
"yes" if a test series is to receive further 
consideration— is whether a majority of the test 
items match local learning objectives. If not, it 
would be a very inefficient instrument to use, it 
would provide the comparative information desired 
from an NRT, but would be of little help in curricu- 
lum evaluation. 

The next consideration is the proportion of objec- 
tives which are measured by test items. Local plan- 
ners may decide that a test is inefficient if it as- 
sesses only a small percentage of local 
objectives— unless the test measures important ob- 
jectives very well and can be used for multiple pur- 
poses. This criterion may also be an indicator of the 
level of spt ificity of local objectives. If the percent- 
age is extremely small for all the test series consid- 
ered, local objectives may be too specific and need to 
be broadened. 



If an NRT is used to assess individual objectives for 
accountability or diagnostic purposes, at least 3 5 
test ite*" : per objective (and the more, the better) 
are recommended. With fewer items, a student 
could guess the correct answer and appear to have 
"mastered" the objective. 

The final alignment consideration is whether test 
items and district objectives receive comparable 
emphasis. If a district uses the worksheet in the ap- 
pendix to determine content validity, the more im- 
portant local objectives should appear in the ap- 
propriate column more often than less important 
objectives. 



Technical Considerations 

Good reliability, i.e., consistency of test scores, is a 
necessary, but certainly not suff*cient, characteris 
tic of a test. How much is "good"? A test whose 
reliability is lower than .80 (the theoretical limit is 
1.0) should be viewed with skepticism, .90 or better 
is commonly achieved by national norm referenced 
basic skills tests. 

The higher the reliability, the more confidence a 
district can have in test scores. This philosophy sug- 
gests that high reliability is needed in those tests 
which are used for making important decisions and 
is especially important when reporting individual 
student scores rather than group scores. 

Subtest reliabilities are most often lower than total 
test reliabilities because reliability depends a great 
deal on the number of items included in a score. 
Therefore, not every subtest reliability will be in 
the .90 range. Still, p test that has lots of low sub- 
test reliabilities should be avoided. 

Expert opinions about standardized tests can be 
found in a number of sources. (See the Halpern arti 
cle referenced at the end of this chapter) Districts 
should still study the technical manuals for each 
series and examine test items, but the reviews can 
save time in narrowing the field of potential tests. 

Tests differ in the groups of students used to estab 
lish norms. Some districts, especially those in very 
urban or rural areas, should review information 
about tests' norming samples. For example, rural 
districts may prefer not to use tests normed on stu 
dents who are mostly from large urban districts and 
instead choose tests whose norming samples include 
a substantial representation of students from rural 
districts. 
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Practical Considerations 

Practical considerations —including questions 
about test format, administration, scoring, costs 
and publisher's services —may distinguish between 
an acceptable and an unacceptable test series. Care- 
ful reading of the publisher's manuals (including 
the administration manual), trying out the test in a 
real-life application, and talking with other dis- 
tricts that use the same test may help answer ques- 
tions about practicality. 

Advantages and Disadvantages of 
Commercial Tests 

Because of the tremendous resources that test pub- 
lishing companies devote to developing standard 
ized instruments, commercially available tests 
offer several distinct advantages over locally devel- 
oped tests. But even the high technical quality of 
commercial tests cannot overcome certain disad 
vantages. The list below includes some of these im- 
portant advantages and disadvantages. 

Advantages include: 

• High technical quality, 

o Norms to allow comparisons with external 
groups, 

• Free consulting from publishers' representa- 
tives, 

• No developmental costs. 

Disadvantages include: 

o No reason to expect good match between what 
is tested and what a district emphasizes in its 
teaching, 

• Often too few items per objective to allow the 
test to be used for diagnostic or certain ac- 
countability and evaluation decisions (e.g., 
determining the proportion of students who 
meet an objective), 

• May be prohibitively expensive— or even impos- 
sible—to get exactly the kind of reports needed, 

• Recurring annual costs for materials and (per- 
haps) scoring. 

While the last disadvantage is unavoidable, the 
impact of the first three can be reduced by careful 
selection and planning. 



Choosing among Tests: Does It 
Really Matter? 

Content emphasis differs among the various stan- 
dardized achievement tests. The two charts below 
emphasize this point graphically. They are based on 
data from a study conducted by the Institute for Re- 
search on Teaching (IRT) at Michigan State Uni- 
versity (Freeman, et al., 1983). That study compared 
the content coverage of commonly used fourth 
grade math textbooks and standardized tests. 

The charts show the comparisons between several 
tests and Scott-Foresm.^.n's Mathematics Around Us. 
The top chart shows that for even the best-matched 
test (the Metropolitan), 25 percent of the test items 
are not covered in the text. The test with the best 
match on coverage (the Iowa— see bottom chart) 
still measures less than 25 percent of the topics cov- 
ered in the textbook. 

The graphs show that since no standardized test 
perfectly measures the curriculum, there will 
always be tradeoffs. The matching activity de- 
scribed in the next section can help districts decide 
what tradeoffs to accept. 



Topics in TE&T covered by text 




Metro Stanford Iowa CTBS 
Norm Eefereaced Tests 



Topiem ixi TEXT covered by teat 



percent 




Metro Stanford Iowa CTBS 
Horm Heferenced Testa 
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Content Validity of Standardized Tests* 
(Sample Form) 



SUBJECT. 



GRADE. 



DATE- 



REVIEWER. 
TEST 



FORM. 



LEVEL- 



Directions: In the first column, write the item 
numbers of the test you are reviewing. In the 
second column, write the code numbers of the local 
objectives which match the test items. Use as many 
sheets as necessary to cover all the items in the 
test. For each test item, consider these three state- 
ments: 

a The item matches a district learning objective, 
a The item is of high quality, 
a The item is at an appropriate level of difficulty 
for the chosen grade. 



Use the following 5 point scale to show how much 
you agree with each statement Enter those codes 
in the last three columns. 

5 = Strongly Agree 

4 = Agree 

3 = Neutral 

2 = Dicagree 

1 = Strongly Disagree 

NOTE: More extensive directions for completing 
this form are on the next page. 



Curricular Match 
Obj. How 



Item 



OK for 



Curricular Match 
Obj. How 



Item 



OK for 



Item 


No, 


Well? 


Quality? 


Grade? 















































































































































Item 


No. 


Well? 


Quality? 


Grade? 




































































































































Totals 









Compare the total scores across tests being reviewed to help make selection decisions. 



*Chart adapted from Alaska Assessment Handbook. 
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Directions for Completing Sample Form 



1. Number the local objectives in the learning 
area under review. Identify each as more impor- 
tant or less important. 

2. Read the test manual sections that describe the 
development of the test, the content areas 
included, and the rationale for the types of 
items selected. Check to see that the general 
test objectives are in line with local objectives. 

3. In the first column, write the item numbers of 
the test you are reviewing. Read each item and 
decide if it measures one or more local objec- 
tives. Do not rely on the test publisher's descrip- 
tion or item classification chart. Enter the ob- 
jective number(s) in the second column. (When 
reviewing many levels of a test, consider sam- 
pling; randomly select 40-50 items at a mini- 
mum of three levels.) 

4. For each item that matches an objective, consid- 
er the following statement: 

"The item matches a local learning objective 
closely." 



Use the following scale to indicate how much 
you agree with the statement: 

5 = Strongly Agree 

4 = Agree 

3 = Neutral 

2 = Disagree 

1 = Strongly Disagree 

Enter the appropriate code number in the third 
column. 

5. Rate the quality of the item, indicating whether 
you agree or disagree with the following 
statement: 

"The item is of high quality." 

Using the same 5-point scale, enter the ap- 
propriate code in the fourth column. 

6. Rate each item's grade-level appropriateness. 
Again using the 5-point scale, indicate in the 
fifth column the extent of your agreement \/ith 
the following statement: 

"The item is at an appropriate level of diffi- 
culty for the chosen grade." 

7. Determine how many test items appropriately 
measure local objectives. If fewer than half the 
items match objectives, the test does not fit 
your curriculum very well. (You may decide to 
do this after Step 3, thus eliminating tests that 
match poorly before you do the work required in 
succeeding steps.) 
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Rating Sheet for Standardized Tests* 

Respond to the items below for each test series 
being considered. (It is not necessary to rate every 
test level or form.) Write the test series being con- 
sidered in a column heading, then rate each test 
using the following scale: 

4 = Good 

3 = Pair 

2= Weak 

1 = Unsatisfactory 

(Use the unsatisfactory rating for any missing in 
formation. Tests should not receive credit for miss 
ing information. Most test publishers know that 



tests cannot be evaluated witI*out adequate techni- 
cal information, missing information reflects nega- 
tively on the test.) 

The previous worksheet "Content Validity of Stan- 
dardized Tests" can be used to answer the alignment 
questions. The first alignment item— the match be- 
tween test items and district objectives— should be 
answered positively before a test is considered fur- 
ther. Beyond that, the test with the highest rating 
may be the one a district should choose although in- 
dividual circumstances may cause a district to in- 
clude additional criteria or to weight them dif- 
ferently. 



^ ALIGNMENT CONSIDERATIONS 

2. High percentage of district objectives measured? 

"r T :S!vMuItiple*KS^^^ ' 

4. Relative importance of district objectives 
reflected in test content? 

vlfiMi^iS^s^ ; . • " 

5. Acceptable reliability (at least .85 or higher) ? 

7. Noiraed re^ce^^ ^ _ _ 

9. Empirical norm dates match district's testing 
_ schedule? 

PRACTICAL CONSIDERATIONS 





JJ.^Itemiy^reeof 8|wc, cultural Mid ethnic bias? 



14. Cost for consumables, scoring, reporting, and 
other services within budgetary limitations? 

[ ^j^^-^freedi^^fe^^ items 

16. Adequate coverage (enough test levels for grades 
scheduled for testing)? 

18. Related tests available if district wants them 
(e.g., co-normed aptitude measure or 
achievement tests for other learning areas) ? 

TEST TOTALS 



*'Chart adapted from Alaska Assessment Handbook 
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A Guide fo^ Assessing Illinois' Students 
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Many local planning groups may decide to construct 
at least some local tests, assessment items, or other 
assessment procedures to use in conjunction with 
other approaches such as ccmmercial tests. Al- 
though local construction i.j a difficult, time- 
consuming process that demands rigorous attention 
to quality, many resources are available to help 
guide it. And, the rewards for having good assess- 
ment procedures that are particularly appropriate 
or valid for local objectives can be worth the efforts. 

Local staff who decide to construct assessment 
procedures can use several different sources of 
items. They can purchase commercial item banks, 
use an item bank developed by the Illinois State 
Board of Education (ISBE) and available through 
local Educational Service Centers (ESCs), and 
write their own items. 



Item Banking 

Item banks are collections of assessment items. 
Local staff can review the items and select those 
which they want to use. Some item banks are com- 
puterized; others are simply stored on paper. Gener- 
ally, the items have been pilot-tested with a sample 
of students, and several types of information are 
available for each item. For example, items in the 
bank developed by ISBE indicate the learning area, 
goal, and knowledge/skill assessed. The items also 
include data from samples of Illinois students who 
took the items during a spring 1987 pilot assess- 
ment. The data include the grade level of the stu- 
dents who took each item, the proportion of stu- 
dents who selected each response alternative, and 
other variables. 

Item banks can be very useful sources of items. 
Most contain hundreds, perhaps thousands, of 
items. Many items have been carefull> constructed, 
reviewed, and pilot tested. Data about item difficul 
ty can help guide decisions about whether or how to 
use an item. Of course, districts must review the 
items carefully to determine the appropriateness 
for measuring local objectives. 
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A few districts may want to build their own banks 
of assessment items. They may begin with a collec- 
tion of items from various sources and gradually 
add other items they obtain or write. Having access 
to a large number of items that have been used lo- 
cally can be very advantageous in future assess- 
ments. Alternative forms of good tests can be con- 
structed relatively easily. 

Writing x^sessment Items 

District personnel who write their own assessment 
items must devote considerable resources, including 
training time, to that task. (Educational Service 
Center staff may be able to assist with that staff de- 
velopment.) Districts need to allocate at least the 
equivalent of several weeks of work to the writing, 
piloting, and revision of items in each selected 
learning area. 

Many guidelines are available to help local staff 
write assessment items. One source Improving Your 
Tbst Questionshy John C. Ory of the University of II 
linois is included in the general appendix. Other 
resources are listed at the end of this chapter. 

After writing items, staff should edit them careful 
ly to control their quality. (Checklists for reviewing 
items— locally constructed as well as items obtained 
from elsewhere— are shown in Improving Your Test 
Questions.) 

Finally, staff should pilot test the items by admin 
istering them to a number of students, at least 10 
students per item. While doing this, they should 
ask several questions about each item. 

• Does the item appear to assess the intended 
knowledge/skill? 

• Do students clearly understand the instructions 
and the items? (Talk to some students about 
why they responded as they did.) 

• In comparison with other information such as 
teacher judgment, do those students who might 
be expected to respond correctly (or incorrectly) 
do so? 

• Are any distractors (incorrect response options) 
confusing or too blatantly incorrect? 

« Does the distribution of responses across dif- 
ferent sex and ethnic groups indicate that the 
item is free of bias? 

• Is the item at an appropriate level of difficulty? 



Constructing Tests from Items 

Local staff groups who write or acquire a large 
number of test items have taken a major step 
toward constructing their own local tests. Test con- 
struction, however, involves more than simply as- 
sembling the items into booklets. To construct 
tests, staff will need to perform the following tasks. 

Pilot-test the items. This task, discussed in 
the previous section, will be necessary only if 
the items have not already been pilot-tested 
with students who are similar to local students. 

Identify the test purpose. Several types of de- 
cisions about test construction will be in- 
fluenced by the intended use of the test results. 
For example, the level of difficulty of items in a 
test for assessing student mastery should be dif- 
ferent than in an achievement test. Most items 
in a mastery test should probably be at a level 
(e.g, 80%-85%) which indicates that students 
have learned the content. An achievement test 
should include items at a wide range of difficul- 
ty levels. Test validity must be estimated in re- 
lation to test purpose. As will be discussed more 
extensively later (in Chapter 7), any test will be 
more valid for some purposes than others. 

Develop test specifications. This stage, some 
times referred to as developing test "blue- 
prints," involves making decisions about the 
composition of a test. The test construction com- 
mittee will specify how the test items should be 
distributed across one or more factors. 

Two factors the committee might particularly 
consider including in the test specifications are 
content and item difficulty level. Developing 
content specifications should include identify- 
ing categories of information to be covered and 
deciding what proportion of the items should be 
devoted to each. Such categories might repre- 
sent headings in a curriculum outline, knowl- 
edge/skill statements, or local objectives. Staff 
might want to include an equal number of 
items for each category, or they may decide that 
some topics receive more instructional emphasis 
than others and should receive comparable 
emphasis on the test. 
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As mentioned previously, the difficulty level of 
the items should probably vary according to the 
intended uses of the test results. For example, a 
test for identifying the proportion of students 
who have achieved an objective or mastered a 
particular content area should consist primarily 
of items which indicate mastery. A test for sort- 
ing students according to their achievement in 
a particular area should contain items that are 
at a variety of difficulty levels. 

Staff may also identify other variables to in- 
clude in the test specifications. They may want 
a test to include items that are at several dif- 
ferent cognitive levels. For example, they may 
decide that most items should assess factual 
knowledge, but that others should assess wheth- 
er students can apply that knowledge. Staff 
may want to develop test specifications for 
multiple types of content categorization. For 
example, in world geography they might exam- 
ine item distribution across knowledge/skills 
and across continents. Specifications for his- 
tory, literature, or fine arts tests might include 
identifying item distributions across major his- 
torical periods or cultures. 

A sample matrix for specifying the contents of a 
test for State Learning Goal #3 in mathematics 
is enclosed in the chapter appendix. 

Assemble the test. This stage will include 
three major activities: 1) selecting items, 2) ar- 
ranging them into test booklets, and 3) develop 
ing instructions for standardized administra 
tion of the test. When selecting items, staff 
should, of course, refer to the test specifications. 
However, staff should also review each item 
carefully. They should consider whether it accu- 
rately reflects local objectives/instruction and 
whether it assesses knowledge or skills they 
think are important. Local staff should also 
examine the information that is available from 
previous administrations of the item. What pro 
portion of students selected the correct re- 
sponse? Does the distribution of responses to in 
correct alternatives suggest that the item is 
poor because one or two alteratives were so ob 
viously incorrect that only a few students (less 
than 10%) chose them? Do the item statistics 
(for example, the point biserial correlation coef 
ficient) indicate that the item functions accept 
ably well? 



Staff will probably decide to arrange the test 
items by level of difficulty (from easy to 
hard) or by content area. Arrangement by dif- 
ficulty level increases the opportunity for 
students to show what they know by allowing 
them to answer the easy items before time 
runs out or before they become discouraged 
by encountering too irar.y difficult items. On 
the other hand, arrangement by topic may 
make it rapier to summarize the data and 
will permit coverage of major topics first and 
less crucial topics at the end of the test. 
Other alternatives include arranging items 
by type (for example, keeping those requiring 
special instructions or graphics together) or 
by other content categories. 

Standard administration instructions are im- 
portant to ensure that the tests are given uni- 
formly to all students. The instructions 
might indicate, for example, directions to be 
given to students (Unclear directions may 
unfairly reduce students' scores.), the 
resources that students may use during the 
test (e.g., books, calculators), whether guess- 
ing is allowed or encouraged, and the amount 
of time allotted for the test. 

Field-test and revise the instrument. 
Before tests are used widely, they should be 
given to a small but representative sample of 
students (10 or more, depending on local cir- 
cumstances) using procedures similar to 
those described previously for piloting items. 
Each time the test is revised, the new version 
should be field-tested. Following these proce- 
dures (with items and with tests) should 
limit the amount of time and resources that 
are lost because tests do not perform as 
expected. 

Review the test for validity and lack of 
bias. This stage may occur before or after 
field testing. Regardless, major changes in 
the instrument will require additional 
reviews or field testing. The reviews should 
be conducted by panels which include teach- 
ers and others who are knowledgeable about 
the content area assessed or with bias review 
procedures. The panels should be independ- 
ent of the test construction committees; 
people who review tests for validity or nondis- 
crimination should not have been closely in- 
volved in test development. Validity and bias 
review will be discussed more extensWely in 
the next chapter. 
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Developing Performance and 
Other Alternative Assessment 
Procedures 

Most local objectives can be assessed with the 
multiple choice (or matching, true-false, or fill- 
in-the blank) items that are common to many pub- 
lished and locally c astructed tests. However, some 
objectives, such as those requiring students to 
"demonstrate"* various behaviors or those related to 
the State goals listed below, can be assessed most ef- 
fectively using alternative procedures such as 
rating student writing or other performance. Those 
state goals which may require alternative proce- 
dures are listed below. 

As a result of cheir schooling, students will be able 
to: 

• write standard English in a grammatical, well- 
organized, and coherent manner for a variety of 
purposes (language arts, state goal #3); 

• use spoken language effectively in formal and 
informal situations to communicate ideas and 
information and to ask and answer questions 
(language arts, state goal #4) ; 

• demonstrate the basic skills necessary to partic- 
ipate in the creation and/or performance of one 
of the arts (fine arts, state goal #3); 

• Demonstrate basic skills and physical fitness 
necessary to participate in a variety of condi- 
tioning exercises or leisure activities such as 
sports and dance (physical development and 
health, state goal #4); 

• perform a variety of complex motor activities 
(physical development and health, state goal 
#6); and 

• demonstrate a variety of basic life-saving activi- 
ties (physical development and health, state 
goal #7). 

Alternative assessment procedures can also some- 
times provide information that supplements and en- 
riches data from other types? of assessment. For 
example, a student might be able to answer ques- 
tions about the scientific process accurately, but 
still be unable to perform a laboratory experiment 
properly. Observation of the latter would provide 
useful information. 



Currently, few performance or other alternative as- 
sessment procedures are available commercially, 
except in writing assessment. However, several 
sources of information about alternative procedures 
are available. As mentioned in Chapter 3, proce- 
dures for assessing student writing have been used 
widely and tested extensively in Illinois echools. In 
formation about those procedures is available 
through ESCs. In addition, suggestions for using 
and writing e^say, problem-solving, and perfor- 
mance items are included in Improving Your Tsst 
Questions which is in the general appendix. Useful 
guidelines may also be found in Gronlund (1985, 
Chapter 15), Popham (1981, Chapter 13) and Roid 
and Haladyna (1982). 

Alternative assessment procedures might include: 

• Assessing student performance by observing it 
directly (for example, observing a student per- 
forming a motor skill or participating in a 
dance or debate); 

• Focusing on processes used in performance or in 
product development (e.g., making preliminary 
sketches of a drawing, interpreting music nota 
tion, or deciding how to use tools such as the 
body or props in a dramatic performance); 

• Assembling and rating portfolios or other arti 
facts of student work such as: 

—visual artworks (photographs, paintings, 
pottery), 

—audiotapes of music performances, 
—dance choreography, 
— videotapes of dramatic performances, or 
—journals or sketchbooks; 

o Assessing the performance of students individu 
ally or as members of a group such as a sports 
team or band; 

• Having student performance assessed by vari 
ous persons: the individual students them 
selves, peers, teachers, content experts such as 
artists, or audience members; 

• Using external stimuli (for example, reproduc- 
tions of visual art, audiotapes of musical perfor- 
mances, or videotapes of dance or dramatic per- 
formances) to assess various kinds of knowledge 
with multiple-choice tests. 

When adopting performance or other alternative as- 
sessment procedures, local planning groups mustde- 
velop standardized instructions for eliciting the 
relevant student behavior, as well as criteria and 
systematic methods for assessing that behavior. 
Teachers should be trained in the process. The infor- 
mation must be thoroughly documented and main- 
tained in a centrally located file. 
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Appendix: Chapter 6 
Sample Test Specification Matrix 
Fundamental Learninsf Area— Mathematics 



Mathematics 
State Goal for Learning 3 

As a result of their schoolinj, sMnts vill be able !o Bale and use 
Kas!irefient$, includiftj those of area and voluse. 



COEBAt KNOMtCOCC/^IClliS BtUtCO to COAi i 

r 

TM following knowUdgt, pro<ffttfft. and Ulllt art rtUttd to tMt St«t« 
C04I fof Lffirnlng: 



NtiturtMnt In virlowt conttitt utln9 ipproprtatt units 
Cttlattlon of ■<iturffB«ntti. 

fiffUtlnQ ItnQtht. irtit. ind voluA«t in ccmoo Q«OB«trlC figurti 



COovcrtton of unlft wlttiln ca« tyttta and fro* ont tyitto to 

Appllcitton of ttWcttd afaturtaent tytt«ai. Initruacnti, and 
tcchAlqutt. 



Areas of Focus - Test Design 




MithTnitlct Abimif 

1. RECALL: the ibitity to rocall and recognize (acts, definitions and symbols quickly. 
Perception Is the primary mental act used. 

2. COMPUTATION: the ability to perform computations, procedures, and complex 
counting where the operations are Indicated. 

3. UNDERSTANDING: the ability to understand concepts, facts, and processes. The- 
3 ^Sv^ettnt^ "^^'^ 

4. PROBLEM SOLVING: the ability to solve complex word problems. Several of the 
following opefatlons mutt bo Involved: Interpretation of the question, identification of 
the relevant data from Ihe given information, decisiona about which operations need to 
bo performed on the data, correct perfofmance of the operations, and interpretations of 
tho results. 
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Chapter 7: Some Important Considerations 



^ ASSESSMl 




A GtLiide f or ASsessihcr lUihois' Siudents 




When developing assessment systems, local plan- 
ning teams ma> ask questions such as. 

How do we ensure that assessment is as valid 
and reliable as possible? 

How do we avoid biasing the assessment 
against certain groups of students? 

Can we adopt a commercial test, a locally devel- 
oped test, or some other assessment procedure 
that will meet most assessment nesds^ 

Are we focusing too much on assessing lower- 
level cognitive skills? 

How can asse3sment become more efficient? 

What should we consider when we develop an 
assessment schedule? 

Local staff will return to some of these questions 
repeatedly. For example, validity, reliability, and 
bias are important when selecting commercial tests 
(or items), developing local items/tests, planning 
how to use them, and incorporating them into a 
comprehensive assessment system. Later, staff 
should return to these topics when administering 
the assessment, interpreting the results, and 
making consequent decisions. 
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Maximizing Validity and 
Reliability 

Districts are required to provide assurances regard 
ing the validity and reliability of each assessment 
procedure included in local Learning Assessment 
Plans (LAP). Both are discussed here, but districts 
should also consult other resources such as those 
listed at the end of this chapter. 

Validity 

Validity is the extent to which a test or other as- 
sessment procedure is capable of producing informa- 
tion that warrants a particular type of interpreta- 
tion. For example, does a test adequately represent 
a particular objective or set of objectives? Can data 
from the test provide a reasonable estimate of otu- 
dent accomplishment of the objective(s)? Validity 
must always be estimated in the context of a partic- 
ular intended use of data from an assessment proce- 
dure. Most assessment procedures are more valid 
for some purposes than others. For example, a test 
may be excellent for estimating the overall status 
of a district's students on a specific set of learning 
objectives in mathematics. However, that same test 
might be much less useful for diagnosing the needs 
of individual students and completely inappropriate 
for evaluating the effects of a mathematics 
program. 

Districts may accumulate evidence of validity 
using at least three different strategies. 1) obtain 
ing evidence about the validity of published tests 
from test manuals or by writing to the publishers, 
2) asking a panel of teachers and other content ex 
perts to review an assessment procedure to estimate 
its validity, and 3) gathering empirical data to 
examine validity. Districts should attempt to accu 
mulate multiple kinds of evidence of validity. Each 
should be associated with a particular intended use. 

Local planning staff may decide to examine validity 
by asking local panels to review content validity. 
Such panels are most likely to review locally con 
structed items/tests, however, reviews will also 
help establish appropriateness of publisheis* tests 
for designated local uses. The local panels should in 
elude teachers and other content experts (such as 
curriculum specialists) who are thoroughly familiar 
with local objectives and instruction. Assuming 
that a panel is examining the validity of an assess 
ment procedure for estimating v/hether students 
have attained local objectives, the panel's major 
task is to compare the assessment procedure to 
local objectives. 



A panel that examines validity might ask questions 
such as: 

Do the items in the assessment procedure ade- 
quately represent the objectives that are pur- 
portedly measured? 

Are there at least 3-5 items for each objective? 
(More may be required for certain test uses 
such as diagnosing the instructional needs of in- 
dividual students.) 

Are some knowledges/skills neglected? 
Overrepresented? 

Are some items irrelevant? 

Are the items at an appropriate level of detail 
for the intended use? 

Ideally, districts should assemble local panels every 
few years to review the continuing validity of as- 
sessment procedures. Objectives and instruction 
are likely to change over time. Validity panels 
which systematically review local assessment 
procedures can help ensure that the procedures 
remain valid. 

Planning groups which decide to gather their own 
empirical data to examine validity can use refer- 
ences such as Gronlund (1985) or Popham (1981) 
for guidance. 

Reliability 

Reliability refers to the consistency or stability of 
assessment results. Several different types of relia 
bility have been identified. One of the most common 
refers to consistency across time. It asks, for exam 
pie, the extent to which assessment results would 
have been the same if a test had been administered 
a few weeks earlier or later. Or, did factors such as 
student guessing, fatigue, or motivation strongly 
affect the results? Unreliable assessment results 
are not, of course, worthy of use in educational deci- 
sion making of any kind. 

Local staff ca^^ obtain estimates of reliability from 
test publishers or develop their own estimates. For 
the latter, staff can refer to several resources such 
as Gronlund (1985) or others listed at the end of 
this chapter. They can also obtain computer pro- 
grams that estimate reliability. Regardless of the 
source, local staff should carefully examine whether 
the method used to compute reliability provides evi 
dence of the type(s) of consistency most important 
for their purposes. 

Gronlund (1985) describes several different meth- 
ods of estimating reliability, as shown in Table 7-1. 
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Method 



Table 7.1 

Methods of Estimating Reliability* 
Type of Reliability Measure 



Procedure 



Test-retest 



Measure of stability 



Equivalent forms 



(Test-retest with equivalent 
forms) 



Split-half 



Kuder-Richardson 



Measure of equivalence 



Measure of stability and 
equivalence 



Measure of internal con- 
sistency 



Measure of internal con- 
sistency 



Give the same test twice to 
the same group with any 
time interval between tests 
from several minutes to 
several years. 

Give two forms of the test to 
the same group in close suc- 
cession. 

Give two forms of the test to 
the same group with in- 
creased time interval be- 
tween forms. 

Give test once. Score two 
equivalent halves of test 
(e.g., odd items and even 
items); correct reliability 
coefficient to fit whole test 
by Spearman-Brown** 
formula. 

Give test once. Score total 
test and apply Kuder- 
Richardson** formula. 



X . Gronlund (1985), p. 89. 

'Boi additional information about these formulas, see Gronlund (1985) or Popham (1981). 



Another type of reliability, which is particularly im 
portant for performance assessment procedures, is 
inter-rater reliability. It is an estimate of the con- 
sistency of the ratings assigned by two or more dif- 
ferent raters. Inter-rater reliability is estimated 
after multiple raters observe and assign ratings to 
the same students performing a particular task. 
The ratings are then compared. Inter-rater reliabili 
ty can be improved through follow-up discussions in 
which raters talk about why they assigned particu- 
lar ratings and attempt to develop common stan- 
dards. 

Several important points to remember about relia- 
bility are listed below: 

• The time interval between test and retest is 
very important and often presents a dilemma. 
The interval should not be so brief that students 
remember answers from one test administra- 
tion to the next. However, it should not be so 



lengthy that factors such as student learning or 
maturation are likely to cause actual changes. 

9 Reliability is a necessary but not sufficient con- 
dition for validity. Without reliability, validity 
is a moot question. What do assessment scores 
measure if they are not trustworthy? However, 
scores can be reliable but not valid; they can be 
very stable but still not assess the intended 
content. 

• Reliability can be estimated only by using a 
statistical procedure. Unlike validity, it cannot 
be estimated judgmentally by a panel of experts. 
Also, reliability is applicable only to tests, not 
to individual assessment items. 

• The split-half and Kuder-Richardson methods 
of estimating reliability may appear more feasi- 
ble because they do not require a test to be ad- 
ministered twice. Indeed, they are very useful 
for examining the equivalence of two or more 
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test forms (including different forms construct- 
ed from item banks). However, the> provide no 
information about stability across time. 

The above discussion applies primarily to norm- 
referenced testing. The reliability of scores from lo- 
cally constructed or other criterion-referenced tests 
can often be estimated using similar methods, but 
adaptations may be necessary. For example, 
Popham (1981) has suggested studying the con- 
sistency of decisions rather than scores. That is, 
how consistent— across time or test forms— are de 
cisions such as the proportion of students who meet 
a set of objectives or cutoff score? Also, Popham 
points out that the smaller number of items on 
many local or criterion referenced tests may reduce 
the size of reliability coefficients, districts may 
need to decide that lower coefficients are 
acceptable. 

Avoiding Bias 

Districts must also provide assurances on their 
Learning Assessment Plans (LAPs) that they have 
taken steps to ensure that their assessment proce- 
dures "are nondiscriminatory in relation to race, 
sex, or national origin.** Districts may also want to 
examine other potential types of bias such as dis- 
crimination against students with disabilities, 
from either urban or rural backgrounds, from low 
socioeconomic groups, or from specific religious 
groups. 

Bias review procedures may include several dif- 
ferent types of questions. 

Do any itcus perpetuate stereotypes about a 
particular race, gender, or national origin? 

Are various groups represented fairly? 

Are any items likely to be offensive to members 
of some groups? 

Have all groups had an equal opportunity to ac- 
quire the knowledge/skill that is assessed? 

Do answers to questions depend on knowledge 
that is not taught in school but that some 
groui^s are more likely than others to have ac- 
quired elsewhere? 

Is the language used likely to be reasonably fa 
miliar to all groups? 

Most bias review procedures can be categorized as 
either judgmental or statistical. Judgmental 
review is usually conducted by a committee or 
panel which reads assessment items/procedures, 
asks questions such as those listed above, and 
idv^ntifies items which appear U. be biased. Those 
items are then either revised or deleted. Statistical 



review involves examining data about the p^form 
ance of various groups of students on the assess 
ment items/procedures. A simple but useful type of 
statistical information is the proportion of students 
from each group who answered an item/procedure 
correctly (commonly known as the item difficulty 
level or "p** value). Items which appear to have been 
much more difficult for some groups than others 
should be reviewed judgmentally to determine 
whether the differences were caused by item bias. 

Some bias review experts prefer either judgmental 
or statistical review procedures. However, a more 
thorough bias review can be conducted if both types 
of procedures are used interactively. Each has dif 
ferent strengths and weaknesses. Neither is suffi 
cient alone. To counterbalance the weakenesses 
and capitalize on the strengths, both should be used. 

Bias reviews should be conducted at several stages 
of the assessment cycle. Item writers should be 
sensitized to the types and sources of bias. Later, 
the items should be reviewed by others who are rep- 
resentative of various groups and knowledgeable 
about learning area content oi statistical proce 
dures. Di.ring the test selection process, the tests 
should be examined for bias. After tests have been 
administered, results should be reviewed 
statistically. 

Districts might establish bias reviev- procedures 
that specify, (a) the types of procedures that will be 
used, (b) the types of committees or panels that will 
be involved in the process, and (c) the stages of as- 
sessment at which bias review will be conducted. 
Guidelines for bias review are included in Bias 
Issues in Test Development (National Evaluation 
Systems, Inc., 1987), which was distributed to Illi- 
nois school districts previously. Before developing 
(or adopting) specific procedures, local planning 
groups should decide what they hope to accomplish 
through bias review: 

—to treat females and minority students fairly, 

—to prevent public controversy and lawsuits, 

-to avoid offending members of selected groups, 
and/or 

—to avoid perpetuating stereotypes. 

Awareness of these purposes will help staff identify 
the types of questions to include in the procedures. 
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Using Multiple Approaches 

Districts should use a variety of assessment proce- 
dures. As indicated in Chapter 3 and elsewhere, dif- 
ferent types of procedures have different strengths 
and weaknesses. By using a variety of approaches, 
local staff can assemble a more comprehensive and 
accurate portrait of student learning. For example, 
a publisher's standardized shelf test may measure 
certain elements of a set of mathematics objectives 
but neglect others. Staff may decided to adopt local* 
ly developed items to complement a publisher's test. 
In another example, a district may have tests or 
items that assess students' understanding of the 
scientific concepts that are imbedded in a particular 
goal. However, those assessment procedures may 
not indicate whether students can perform related 
scientific tasks, e.g., design and conduct an experi- 
ment or analyze the composition of a chemical com 
pound. Local planning staff may decide to assess 
these skills by observing students performing the 
tasks; planning staff would establish standardized 
performance situations as well as uniform criteria 
and procedures for rating student performance. 



As&^essing Different Cognitive 
Levels 

When selecting or developing assessment proce- 
dures, staff should examine the level of student un- 
derstanding that each assesses. For example, they 
may want to use the classification scheme (knowl- 
edge, comprehension, application, analysis, synthe- 
sis, and evaluation) pre nted in Bloom's Taxonomy 
of Educational Objectives (1956). Or, they may want 
to develop a more simple thre?-category system. 
Planning staff should make certain that they 
assess various cognitive levels and that local assess- 
ment procedures are not weighted too heavily 
toward the lower levels of knowledge and compre- 
hension. To assist with this task, districts may de- 
velop checklists to indicate the level of understand- 
ing assessed by each item. 



Using Assessment Procedures for 
Multiple Purposes 

Generally, assessment procedures that are included 
in comprehensive assessment systems have been 
selected or developed primarily for specific purposes 
such as assessing students on a particular set of 
learning objectives. However, districts may be able 
to improve assessment efficiency by adopting proce- 
dures that can be used for more than one purpose. 
For example, some standardized publishers' shelf 
tests might be used to assess several different sets 



of objectives (e.g., in reading and mathematics), to 
report local and national comparisons to the public, 
or to examine the effectiveness of Chapter 1 pro- 
grams. To the extent feasible, planning staff should 
attempt to identify assessment procedures that can 
be used for multiple purposes. However, the appro- 
priateness of each procedure for its major intended 
use should always remain the top priority. 



Developing an Assessment 
Schedule 

Comprehensive assessment systems may include 
schedules which specify the grade levels and time 
of year each assessment procedure will be 
administered. 

When deciding which grade levels to assess, local 
planners might consider factors/questions such as 
the following. 

• P.L. 84-126 requires students in grades three, 
six, eight, and eleven to be assessed annually on 
local objectives in all six fundamental learning 
areas. (These requirements are being phased in 
under a schedule specified in the Rules and 
Regulations and elsewhere.) 

• Should students in prJor grade levels be as- 
sessed to identify and address instructional 
needs and increase the proportion of third, 
sixth, eighth, and eleventh graders wh ^ meet 
local objectives? 

• Does the district's organizational structure sug- 
gest grade levels at which it might be partic- 
ularly useful to assess students— e.g.j just 
before they advance to other school levels? 

« Do local curricula or textbooks suggest grade 
levels at which it might be especially useful to 
assess students? 

Planners may also consider several factors when 
they decide what time of year to admmister each as- 
sessment procedure. 

• Assessment should be spread out over the 
school year so that information for improving 
instruction is available continuously. 

• State assessment of third, sixth, eighth, and 
eleventh grade students occurs in April as speci 
fied in the Illinois School Code and elsewhere. 

« Districts should allow sufficient time for scor- 
ing and data compilation between an assess- 
ment procedure's administration and its inter- 
pretation and use. 
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Chapter 8: Administering Assessment Programs* 



ASSESSMENT 







A Quidelor Ass(gssing UUnois' Studen^^^ 




Things to Do Before, During, and 
After Testing 

Often local staff devote much effort to Designing as- 
sessment programs --selecting standardized tests, 
developing local tests, and so on. Once programs are 
designed though, additional thought about how to 
administer them is needed. 

Important details must be attended to before, 
during, and after tests are administered. Some 
details can be handled by one person in a district; 
others require a cooperative effort among admin- 
istrators, teachers, students, and even parents. 
Checklists in the chapter appendix outline responsi- 
bilities at the district, school, and classroom levels. 
The information here explains some of those activi 
ties further. 

Before Testing Occurs 

• Determine the number of tests, answer sheets 
and manuals needed for each grade. Ordering 
extra tests and answer sheets (say five to ten 
percent more than needed) for an emergency is 
a good idea. 

• Because of test security, do not stockpile stan- 
dardized tests. On the other hand, allow plenty 
of time for receiving materials and for distribut- 
ing them to schools. Ordering materials three 
months before planned use should allow enough 
lead time. 

• As materials arrive, check to make sure that 
everything necessary was received. Open boxes 
and determine that sufficient quantities of all 
materials were received. 

e As testing time approaches, organize the mate- 
rials for easy distribution. To preserve security, 
do not distribute tests to schools until about 
two weeks before testing, but do arrange them 
for later distribution and keep them in a locked 
central location. 



*Chapter adapted from Alaska Assessment Handbook. 
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When Testing Occurs 

• Be sure that testing rooms have accuratb clocks 
with second hands. If not, test administrators 
can use stopwatches and write the time on 
chalkboards at intervals to inform students of 
how much time remains. 

^ Read the administration manual and test care- 
fully brfore the actual testing period starts. 
Highlight and be prepared to explain any direc- 
tions that students might not understand. 



Practical Tip 

Students get cues about how to react to a 
test from teachers. They need to know the 
teacher wants the test information. If a teacher 
reads the directions carefully and proctors the 
test rigorously, students are more inclined to give 
their best effort But if a teacher conducts the 
testing too casually and haphazardly, paying 
little attention to the test directions or 
to class behavior, students may react 
accordingly and the test will not suffi- 
ciently indicate what they know. 



After testing is finished: 

« If the school or district has a history of lost 
mail, make copies of the answer sheets before 
sending them for scoring. 

« Clearly specify any special scoring options, sum 
mary reports, or data handling requests. Going 
back later and performing subsequent analyses 
is much more expensive than doing them the 
first time through. (Decisivons about report con- 
tents and table formats should have been made 
"svhen tests were selected or developed.) 

When scores are received: 

• Let students know how they did on the test per- 
sonally, or if only group results are available, 
how their group performed. Taking a test and 
then never learning the results can be very 
frustrating. If appropriate, teachers may use 
the discussion of test results as an opportunity 
for review or re teaching. 

Preparing Students to Take Tests 

The best way to prepare students to take tests is to 
be sure that they have mastered the content the 
tests measure* However, research has shown that 



some students are "testwise"; that is, they haVe cer- 
tain skills which are independent of their knowl- 
edge of the content tested but which make them 
better at taking tests. Consequently, there is a 
small but consistent difference in test scores in 
favor of studentp who are testwise. 

Testwiseness can be taught, and teaching it is con- 
sidered ethical. The folio A^ing list includes sugges- 
tions that do not make students any "smarter,** but 
help ensure that they get credit for what they do 
know. 

Testwise students: 

• Choose the most correct answer, even if more 
than one choice is partly true. 

• Always estimate the answer for a number prob- 
lem before working it to see if the final result is 
reasonable. 

• Look at the questions which accompany a read- 
ing passage or story problem before reading the 
passage to notice the information that is 
needed. 

o Watch out for "Nore of the above" and "All of 
the above" answer choices. These choices make 
items harder because students have to decide 
whether any answers are right or whether more 
than one is correct. 

• Guess if they don't know an answer, especially 
if they are sure one or more of the responses is 
obviously wrong. 

0 Skip the hardest questions on the first pass 
through a test. Answer the easier items first, 
then go back and work on the header ones. 

• Carefully mark answers on computer-read 
answer sheets. Erase marks completely when 
changing an answer, and do not put stray 
marks on the sheet. Make marks dark enough 
for the computer to read. 

• Stay aware of the time. Note what time it will 
be when ten minutes remain; use those ten mi- 
nutes to review answers. 

• Get a good night's sleep before a test, and eat a 
good breakfast that morning. Avoid drinking a 
lot of liquids or eating a big meal right before 
the test. 

• Stay calm, relax, and concentrate. A little anx 
iety about tests is okay, but testwise students 
do not let tests make them overly tense or 
upset. (Hill [1980]) discusses the effects of high 
test anxiety and suggests ways to reduce it.) 
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What Makes a Test 
**Standardized^? 

A test that is labeled standardized is assumed to be 
given to all students under the same administration 
conditions. Those conditions are the ones used 
when the norm group was tested. If the standard 
conditions are not met, the comparison of results 
with the norms is invalid. Several changes in ad- 
ministration procedures could invalidate results, 
including the following, all of which should be 
avoided. 

1. Reading the directions haphazardly. The ad- 
ministration manual specifies which directions 
must be read verbatim and which can be 
paraphrased or expanded. Pay attention to 
these directions. In most cases, there will be a 
chance to clarify instructions when students re- 
spond to practice items. 

2. Not timing the test exactly. Allowing more 
time gives students an unfair advantage over 
the norm group. SpenrlLng less time than the 
norm group may lower students' scores unfairly. 

3. Reading questions aloud that are meant to 
be read silently by students, deHning words 
in test items, or explaining what items are 
asking. Answer procedural questions, but 
never answer questions about content. 

4. Translating items. Students who are new to 
an English speaking school and usually commu- 
nicate in another language should probably be 
excused from taking a test in English- because 
the scores v/ill not accurately represent what 
they know. Translating items into a student's 
native language is not acceptable when the stu- 
dent's skills are being compared to a norm 
group. 

In summary, normative information that publishers 
provide about student test scores is based on the as- 
sumption that directions and other testing condi- 
tions are the same as when the test was adminis- 
tered to the norm group. Deviation from publishers' 
instructions will invalidate that information. 



Teaching to the Test vs. Teaching 
the Test 



their locally developed criterion-referenced tests 
(CRTs) measure. 

The first instance is clearly nnethical. The second 
is not. In fact, it can be supported. Major differences 
in these situations explain why the second activity 
is commendable, while the first is to be guarded 
against. 

In the first situation, standardized test items are 
being studied, while in the second CRT objectives 
are being reviewed. The CRT objectives coincide 
with the district's curriculum; the tests were devel- 
oped to measure that curriculum. The same state- 
ment is unlikely to be true for the standardized test. 

In reality, teachers in the second situation are actu- 
ally studying the district's objectives when they 
review CRT objectives. They are reviewing what 
should be taught, rather than simply what will be 
tested. 



How Teachers Can Help Students 

Just as students can do certain things during tests 
to show their knowledge to best advantage, teachers 
can do things to help students get ready for tests. 
Teachers may prepare for the test in the following 
ways. 

• If the test is untimed, make certain that stv- 
dents have sufficient time to complete it. 
Students should be given as much time as they 
need— as long as they are making progress. 

• If a test is timed, keep students apprised of 
how much time is left to complete it. Writing 
the time on a chalkboard is better than an- 
nouncing it. 

• Do the practice items with ehe group. Other- 
wise, many students will ignore them. Practice 
items are very important, for they show stu- 
dents anything unusual about the wording of 
test questions and help studentb maxk their an- 
swers correctly. 

• Make certain that students fake the test se- 
riously. Without playing on students' anxiety, 
stress the importance of the test. Present the 
testing situation with a positive attitude about 
both the u^wfulness of the test and students' 
ability to cope with it. 



Durhig meetings held before school starts in the 
fall, a district's teachers pore over copies of their 
standardized achievement test, which will be ad- 
ministered in April, Some teachers are observed 
copying items. In another district, teachers spend 
most of a morning reviewing the objectives which 
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Administering an Assessment Program;* 
District-Level Responsibilities 

Use this checklist to keep track oi district-level responsibilities for administering a successful assessment 
program. Space is provided at the end of each section for adding other activities. 



Before Testing 



Schedule testing dates for district. (Commu- 
nicate these dates to all interested parties.) 

Determine number of students to be tested. 

Determine number of tests, answer sheets, 
and test manuals needed (allow 5-10% 
extra). 

Determine which scoring options, summary 
reports, or data handling will provide the 
most useful information. 

For commercially published tests, place 
order with test publisher three months 
prior to testing date. 

For locally developed tests, complete print- 
ing of materials one month prior to testing 
date. 

If scoring will be done locally, order or pre- 
pare scoring materials (keys, report forms, 
directions, etc.). 

When test materials arrive from the pub- 
lisher or printer, check them over carefully. 

Package materials for distribution to school 
sites. 

Distribute materials to schools two weeks 
before testing. 



During Testing 



After Testing 



Check in materials returned from school 
sites. 

Discuss testing with school-site staff to 
determine if there were problems or con- 
cerns. 

If scoring will be done locally, prepare and 
process materials according to established 
procedures. 

If scoring will be done elsewhere and the 
district has a history of lost mail, copy 
answer sheets before mailing them. 

Bundle answer sheets according to publish- 
er's instructions. 

Notify publisher if any out-of-level testing 
has been done. 

Specify any special scoring options, sum- 
mary reports, or data handling desired. 



When Score Reports Are Returned 
Distribute test results to schools. 



Train teachers in interpreting test results 
to students and parents and in ui'ng results 
for instructional improvement. 



Be available if schools have questions or 
need additional materials. 



Share test results with concerned groups 
(parents, school board, newspaper, etc.). 

Review test results to analyze student per- 
formance on objectives. 



*Ad&nted from Alaska Assessment Handbook. 
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Administering an Assessment Program:"^ 
SchoobSite Responsibilities 

Use this checklist to keep track of school site responsibilities for administering a successful assessment pro 
gram. The checklist is divided into .wo parts. Part I lists activities for a school coordinator. Part II lists activi 
ties for each teacher who administers tests. Spaces are provided for adding other activities. 



PART I: SCHOOL COORDINATOR 
Before Testing 

Arrange for appropriate testing rooms (ade- 
quate seating and light, comfortable tem- 
perature and ventilation, no distractions 
from outside, a clock with a second hand, 
chalkboard)* 

Distribute test materials to teachers (in- 
cluding stopwatches, if needed). 

Provide any special test administration 

training needed. 

Inform parents in advance of testing dates 

and offer suggestions for preparing students 
for the test days. 



During Testing 



Be available if teachers have questions or 
need additional materials. 



After Testing 

Arrange for make-up testing as needed, 

Return materials to district coordinator. 

Articulate any problems or concerns to the 

district coordinator. 



PART II: TEST ADMINISTRATORS 
Before Testinr' 

Prepare students by discussing the purpose 

of the test and teaching test-taking skills. 

Motivate students to do their best work. 

Have an extra supply of pencils available. 

If testing room, does not have a clock, have a 

stopwatch avail?5b!!:i :c poet time on board. 



Review the administration manual and test 
carefully for any directions that may be dif- 
ficult for student? to understand. 



During Testing 

Arrange desks to face front of room. 

Check that lighting, temperature, and 

ventilation are all optimum. 

Put "Do Not Disturb" signs on doors. 

Make sure that each student has a test 

booklet, answer sheet, and pencil. 

Follow all procedures as described in admin 

istration manual. 

Complete all practice items with the group. 

If tests are timed, keep students informed 

of the time left to work. 

If tests are untimed, allow students as 

much time as they need to finish their work. 

Circulate during testing to make sure that 

all students are following the directions 
and marking their answer sheets correctly. 

Answer procedural but not content questions. 



After Testing 

Check answer sheets for names, complete- 
ness of other identifying information, dark 
marks, clean erasures, and no stray marks. 

Return materials to district test coordinator. 



When Score Reports Are Returned 

Interpret test results for students and par- 
ents. 

Use test results for instructional improve- 
ment. 



""Adapted from Alaska Assessment Handbook 
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Why Keep Records? 

After assessment scores have been distributed to 
parents, students, teachers and administrators^ 
what should happen next? What assessment 
records should be maintained and for how long? 
What should be kept and what discarded? 

Records should not be kept just because "it's always 
been done." Instead, school personnel should identi- 
fy reasons for maintaining records and establish 
recordkeeping procedures appropriate for each in* 
formation need. 

Following are some important reasons for keeping 
assessment records. 

• Parents and students expect schools to keep 
tes' scores as part of a student*s cumulative 
record. 

• Teachers need information about student 
achievement to make effective instructional 
management decisions. 

• Districts mus* comply with the following assur 
ance statement which superintendents sign 
and submit to the State with the learning as 
sessment plans: 

The district v/ill maintain in a central loca- 
tion a copy of materials related to its Learn- 
ing Assessment Plan (e.g., learning objec 
tivos, descriptions of its assessment proce* 
dures, assessment instruments, amendments 
to this Plan, copies of information dissem 
inated to the public, and correspondence 
from the State Board of Education approving 
the Plan and subsequent amendments) to be 
made available to the State Board of Educa- 
tion for examination upon request. 

-Section 210.120(c)(5) 

• School and district staff need information 
about pupil progress to report to the public and 
to evaluate the effects of curriculum and 
instruction. 



•'Chapter adapted from Alaska Assessment Handbook 
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• State and federal programs may require student 
achievement information as part of their 
mandatory evaluation requirements. 

• A district may want to monitor the effects of its 
programs over time. 



What Types of Information 
Should Be Kept? 

A recordkeeping system needs to be comprehensive 
enough to maintain the various types of informa- 
tion that will be required. The assessment compo- 
nent of the system might include information about 
individual students, including background 
demographic characteristics and scores from dis* 
trict and state tests, as well as student test perfor- 
mance aggregated at the class, school and district 
levels. 

Additional information that would be useful when 
considering the meaning of test scores includes. 

^ Each schools instructional programs, ncluding 
curriculum goals and objectives, textbooks, in 
structione^ approaches, and any special 
programs; 

• Changes that might be relat(id to student per- 
formance, including new curriculum or instruc- 
tional efforts, new textbooks, changes in teach 
ing staff, changes in instructional time, and 
shifts in tests or testing procedures. 

The following three steps should guide the design of 
a comprehensive recordkeeping system: 

1. Determine all district information needs, at in 
dividual student, classroom, school site and dis 
trict levels; 

2. Verify that any mandatory requirements asso 
ciated with state or federal programs will be 
met; 

3. Design data gathering and storage procedures 
that will meet information needs most efficient- 
ly and cost-effectively. 



How Long Should Assessment 
Records Be Kept? 

One district successfully manages its assessment 
data using the following guidelines. 

• Score reports are returned to students, parents, 
teachers and administrators as soon as possible. 



o Individual student score reports are entered 
into each student's cumulative record. 

• The district office maintains an extra copy of in- 
dividual student score reports for one yeai. 

• Grade-level score reports (e.g., group perfor- 
mance summaries) are maintained in the dis- 
trict office until students have graduated. For 
example, score reports from current first grad- 
ers will be kept until those students graduate 
from high school. 

• Grade-level score reports are stored in note- 
books by year (for example, all score reports 
from this year's assessment are rA.ptd in a single 
binder). 



Using Assessment Records: 
Three Scenarios 

Assessment records, no matter how comprehensive 
or well maintained, are not very uteful by th n- 
selves. They hsc-.me beneficial when data are used 
to answer important educational questions. The fol- 
lowing three exan^ples demonstrate some effective 
uses of assessment information. 

In Evaluation (Scenario 1) 

A new mathematics textbook series was adopted 
two years ago for grades five through eight. Since 
then, math scores in those gra ' s have dropped. An 
evaluation is designed to learn whether the new 
textbook was a mistake. 

District assessment records include information 
about students* math scores for several years before 
the new text waa adopted, student performance in 
other subject areas, and student background char- 
acteristics. These data will be used with inform^*, 
tion such as how well the test measures important 
district goals, how well the test matches the text- 
book, and teachers' opinions about what has hap- 
pened. Assessment records will not be the only 
source of information for this evaluation, but they 
will make an important contribution. 

In Curriculum (Scenario 2) 

When assessment scores inuicated that study skills 
was a ^ 3ak area, u district d^ 'xided to examine its 
language arts curriculum, .^ome dc^*.4>ite weak- 
nesses were found in study skills, and lew objec- 
tives were developed and implemented, llie next as 
sessment results showed a definite improvement in 
study skills although isolated areas remained weak. 
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Now the curriculum committee will examine the re 
maining weak areas to determine which warrant 
more attention. In two years, the newest assess- 
ment scores will be examined to judge if the dis- 
trict's progress has been adequate. By using assess- 
ment records within the context of its own particu- 
lar objectives and goals, this district has 
demonstrated another way to use assessment 
records to help inform educational decisions. 



Practical Tip 

Many schools have considerable turnover 
in teaching staff from year to year New staff 
should not have to start from scratch to learn 
their students' current achievement levels. Good 
assessment records can be extremely valuable to 
a teacher who wants to know where 
students stand 



In Instruction (Scenario 3) 

Because problem solving is important in a district's 
mathematics curriculum, a special computer- 
assisted instructional program is adopted. Equip- 
ment is ILr.ited; only half the students can enroll. 
The selection of participants is facilitated by the 
school's complete records of problem solving scores 
on the district standardized test. Staff consult 
these records to identify students who will benefit 
most from the computerized instruction. Also, 
teachers are asked to estimate students' problem 
solving ability Used in combination, teacher judg 
ment and assessment records provide a better in 
structional decision than either source alone. 



Computers Can Help in 
Recordkeeping 

Recordkeeping requires the maintenance and 
manipulation of data, tasks at which computers 
excel. Advances in computer technology have made 
computers affordable to districts that want to 
purchase computers for handling assessment 
records. But certain cautions are in order. 

• Published, computer-based recordkeeping sys- 
tems are not always well designed, and even the 
best of them may not be suitable for a given 
setting. 

• Some computers that are already in schools are 
not suitable for detailed recordkeeping with 
many students. Their storage capacities are not 
sufficient. Before deciding to use a computer 



system for assessment recordkeeping, districts 
should make sure that their records will fit. 

Some schools have converted to computer sys- 
tems, spent months preparing and entering 
data, and then found that the storage space 
would not hold all of the records. 

The system may not be flexible enough. Some 
systems permit only limited tvpes of reports 
and inquiries. 

The system may not have any advantage over 
paper-based systems. Unless the computer 
manipulates the data to produce new informa- 
tion, a paper-based filing system is just as 
practical. The system should do something 
other than just store information. 

A district may not have the resources to use the 
system appropriately. For example, if only one 
set of test scores 13 not entered, the integrity of 
the entire system is jeopardized. A computer- 
based system sometimes makes recordkeeping 
more formal and intensive than staff are willing 
or able to undertake. 



Large Sites vs. Small 

Recordkeeping systems that are practical in dis- 
tricts with large numbers of students may not be 
practical in smaller districts. An extensive, 
computer-based system might be very inefficient in 
a district with only five or ten students per grade. 
Under such circumstances, printed reports are easy 
to use for teference, further, there is not much in- 
formation to manipulate (for example, averages are 
easily done with a calculator). If some information 
necessary for decision making is not available, it 
can probably be collected from other existing 
sources relatively easily. 

On the other hand, districts with thousands of stu- 
dents cannot successfully manipulate assessment 
records by hand, nor can information be collected 
on an ad hoc basis. The recordkeeping systems for 
such districts have to be established in advance, 
probably with the help of experiencv9*i data process- 
ing personnel. (For information ahct\t assessment 
recordkeeping systems used in the Springfield and 
Joliet school districts, see document? listed in the 
reference section.) 
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Student Records and Compliance 
with the Law 

Administrators need to comply with two federal 
laws when designing a recordkeeping system. Both 
tho Hatch Amendment, with its associated 1984 
Department of Education regulations, and the 
Family Educational Rights and Privacy Act of 1974 
(the Buckley Amendment) affect the handling and 
release of student records. These acts are designed 
to give parents more control over the testing and 
teaching of their children. In addition, they give 
parents and students access to their educational 
records and certain rights regarding the dissemina- 
tion of records containing personally identifiable in- 
formation. The major provisions of each law are 
summarized below. 

The Hatch Amendment 

• Parents must be given an opportunity to inspect 
instructional materials and give their consent 
before students take part in a wide range of 
classroom activities or use materials in pro- 
grams receiving federaJ funds. 

o Parents must give consent before their children 
submit to "psychological tests or treatments" in- 
volving potentially embarrassing psychological 
problems, antisocial or self-incriminating beha- 
vior, criticisms of family members, and state- 
ments of family income. 



The Buckley Amendment 

• Students' educational records must be released 
upon request to parents (including a noncusto- 
dial parent) or to students 18 years of age or 
older. 

• Personally identifiable information in student 
records may be disclosed only with written ap- 
proval of parents. 

• Parents and students are allowed to correct 
errors in students' records. 

• School officials with legitimate educational 
interests are allowed access to educational 
records of a student without prior parental 
approval. 

o City or state police officers and potential 
employers are not allowed to have routine 
access to student records. 

• Federal funds can be withdrawn from a district 
for noncompliance with the regulations. 

Suggestions for minimizing compliance problems 
with the two laws include publicizing parents* and 
students' rights in school publications (for example, 
parent or student guides) and establishing a consis- 
tent policy for discussion and complaint. 
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Chapter 10: Reporting Assessment Results 



A Giiide f0r AssessingMinoiis\S1uden^^^ 




8 Redding V!1iat^ 

a HS^rpretUga^ Reporting. 
Asf^ssmeht Data 



Districts are required to develop systems for report- 
ing their assessment results to local residents annu 
ally. The reporting systems stiall include at least 
"statements of the degree to which the district's 
goals, objectives and expectations for student 
achievement are being met, and if not, what ap- 
propriate actions are being taken" That informa- 
tion shall be disseminated through reports present- 
ed at or sent to regular school board meetings, local 
newspapers, and students' parents (from adopted 
rules for the Learning Assessment and School Im- 
provement Plans, Section 210,130— Reporting 
System), Districts should make several kinds of de- 
cisions about their reporting systems, as discussed 
this chapter. 



Deciding What to Report 

Local planning teams should decide what level of 
detail as well as what kinds of information to 
report. They should ask questions such as the 
following. 

Will data be summarized across objectives 
(within go?l, learning area, and grade level)? 

Will data be reported for each objective sepa- 
rately? Each assessment procedure? 

Will data be aggregated across students, class- 
rooms, and schools (i,e,, reported at the district 
level only)? 

Will school-level data be reported? 

Will the data be aggregated in other ways also? 

Will reports include anything other than data 
on student progress on loc il expectations? For 
example, will they include information such as 
attendance and dropout rates? Norm-referenced 
test data? State assessment data? Students' re- 
sponses to sample assessment items? 

The data that districts report on student achieve- 
ment of local objectives may consist of simple per- 
centages of students who met local expectations or 
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criterion levels. Although norm referenced tests 
may be used to assess some objectives, normative 
scores may not be available for the specific clusters 
of assessi^ent items m ed with indiv idual objectives. 

Displays of assessment information regarding stu- 
dent achievement of local goals and objectives may 
be a major feature of a district's report to the 
public. Eowever, the report might also include 
other elements such as: 

o state assessment data, 

• normative data from publisherb' standardized 
shelf tests, 

• other indicators of school effectiveness* (e.g., 
attendance and dropout rates, academic 
awards, and constituent satisfaction), 

o student performance on selected illustrative as- 
sessment items/procedures, and 

9 indicators of local problems (e.g., student mobil- 
ity, absenteeism, and class size). 

Interpreting and Reporting 
Assessment Data 

Because norm-referenced tests have been used 
widely for many years, considerable information 
exists about how to interpret and report the results. 
Some of that information is included in the appen- 
dix along with a few guidelines for developing 
charts. 

Interpreting data from other assessment proce 
dares, such as locally constructed tests or perfor 
mance assessment, and reporting the data to the 
public may be difficult for local educators. Especial- 
ly during the first few years, districts may have 
only rough estimates of the difficulty level of vari- 
ous items and other procedures. Even establishing 
realistic local expectations may be difficult. Local 
staff will need to explain these complexities to their 
public. With experience, they can adjust expectan- 
cies and gradually collect data that are increasingly 
meaningful. 

The frequent use of norm referenced tests and the 
existence of several uypes of statistically derived 



scores make it easy to asaume that various uses of 
the scores are warranted. However, staff must exer 
cise caution when using norm referenced test data. 
The following are several important precautions. 

Be extremely careful about aggregating data 
across tests. Such aggregation is not legitimate 
(Freeh tling & Myerberg, 1983). Although two tests 
may appear to measure the same learning area, 
they may do so very differently. The content of the 
tests may vary widely. The format of the items r-.ay 
be so different that even if the content were the 
same, student performance might not be consistent 
across the two tests. Also, the tests were normed on 
different groups of students. Although publishers 
attempt to use "nationally representative"" samples, 
the groups sometimes vary considerably. 

Do not attempt to aggregate certain types of 
normative scores* Regardless of whether the com- 
parison is of different groups of students or of a 
single group at different points in time, small dif 
ferences may be due more to measurement errors 
than actual differences in students. All scores con- 
tain measurement error due to factors such as am- 
biguous items, guessing, and distractions during 
the testing situation. Even "statistically signifi- 
cant" differences (i.e., differences that are suppos- 
edly "real" because they are greater than the es- 
timated measurement error) shoul<^ be interpreted 
cautiously, especially if they are small or repf^sent 
very large groups of students. Another reason for in- 
terpreting group differences carefully is that they 
may be due more to external factors than to schools. 
For example, some groups' cultural backgrounds 
may have given them more opportunities to acquire 
certain knowledge/skills outside of school. 

Be careful when comparing results across 
grade le\els. Again, the norms are based on dif- 
ferent samples which are probably not comparable. 
Also, differences in test content and fidelity to the 
curriculum may help explain group differences. As 
Frechtling and Myerberg (1983) suggest, the test 
score declines that commonly appear as students 
progress through school may be due mainly to in- 
creased curricular variety and a consequent decline 
in the alignment between what is taught and what 
is tested. 
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Program Effectiveness Indieaters Used by 
Zion-Benton High School 



L ANALYSIS OF EXTERNAL TESTING 

—Educational Development Series (EDS) - academic achievement 4,est administered to all students in 
8th, 11th, 12th grades 

-ACT, SAT, PSAT, AP, NMSQT- voluntary tests taken by college-bound students 

IL ANALYSIS OF DEPARTMENTAL TESTING 
—Departmental Final Examination Results 
—Grade Analysis 

III. ANALYSIS OF AWARDS AND INTERSCHOLaSTIC COMPETITION PERFORMANCE 
—Academic Contests 
—Artistic Contests 
—Athletic Contests 

IV. ANALYSIS OF BEHAVIOR AND DISCIPLINE STATISTICS 
—Attendance Fatterns 
—Discipline Patterns 
—Dropout Rate 
—Suspensions 
—Expulsions 

V. ANALYSIS OF CONSTITUENT SATISFACTION 
—Parents* Perceptions 
—Teachers' Perceptions 
—Seniors' Perceptions 
—Graduates' Perceptions 
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Chart 2* Better 

"gradeK — 5% 



grade? 
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Class of 1985^ responses to question 
regarding their entry into school district 



"What was your first grade in this school 
district? (Response from Class of 1985) 
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Chart 2* Better 
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Percent Correct 
Reading Score 



Females 



Males 



What makes Chast 2 
better tS&am CSiasrt 1? 

a Bars are wider than the space 
between them. 

« Zero is Included on axis. (Zero 
can be omitted, but break in 
axis should alert reader.) 

o Grid lines don't pass through 
bars. 

■ Y axis uses scale that makes 
bars easy to interpret. 



makes CLart 2 
better thmj& Chart 17 

■ Too many segments in Chart 1 
—limit pie charts to five or 
fewer segments. 

B Chart 2*s title is easier to 
understand. 

» Lab'^Is should go Inside 
segments when there is enough 
space. 



What makes Chasrt 2 
better tksm Chart 1? 

■ Confidence band (determined 
using standard error of 
measurement) shows that 1985 
score is not really different 
from 1984*s. 

B Yaxis is labeled horizontally. 

« Patterns used for Chart 2*s 
segments don't distrac. reader. 



What makes Chart 2 
bette 7 than Chart 12 

" Males* and females* scores are 
compared on the same scale 
(see Yaxis). 

• Axis numbers are large enough^ 
to read easily, and enough tick 
marks are added to aid 
interpretation. 

■ Lines showing data are thicker 
than grid lines. 

» Each data line is labeled. 



1983 1984 7.985 



^ *^rom Alaska Assessment Handbook 
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Properties of Different Score Systems 



Ndme of Score 


Description 


Major Advantages 


Major Limitations 


Raw Score 


Indicates the total 
number of test items 
correctly answered by 
a student. 


+Can be computed 

easily. 
-i-Canbeused to 

assess mastery. 


-Should not be averaged. 
*Not suitable fcr 

comparing 

performances on 

different tests or 

subtests. 


Percentile 


Indicates the 
percentage of 
students in the norm 
group whose score is 
less than or equal to a 
given score. 


+Easy to explain. 

+Useful for 
comparing the 
performance of an 
individual with a 
norm group. 


• Not an equal interval 
scale, and so cannot 
be averaged among 
students or subject 
areas. 


Stanine 


Distributes scores 
into nine broad 
intervals. 


-i-Provides a general 
description of 
student 

performance level. 
+Can be averaged. 


-Does not allow fine 
discrimination. 


Normal Curve 

Equivalent 

(NCE) 


Converts percentiles 

into a normali^pH 

equal interval scale 
suitable for comput- 
ing end comparing 
gains in achievement. 


-hHas the combined 

centiles and stanines. 
It can be used for 
comparing the per- 
formance of a group 
with that of a norm 
group and can also be 
meaningfully averaged. 


•Is a scale not familiar 
to many. 

-Not all test pub- 
lishers use it. 


Expanded 

Standard 

Score 


Is an equal interval 
score system which 
links several 
overlapping levels of 
a test. 


+Makes it possible to 
track students longi- 
tudinally from grade 
to grade. 

-hAlso possible to test 
studena at functional 
level and interpret 
results at their 
grade level. 


-Interpretation 
requires familarity 
with particular test 
and subtest. 

-Is given different 
names by different 
publishers 




Prepared by ECIA Chapter I Technical Assistance Center 
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Normal Curve Distribution of Test Scores 
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A Glossary of Assessment Terms'^ 
A through E 

■ achievement test: A test that measures the 
extent to which a student has acquired certain 
knowledge or mastered certain skills. 

■ alignment: The process of assuring that curricu- 
lum, instruction and assessment procedures all 
match each other and that communication 
among educators and administrators at all levels 
within a disi'ict is open and functional. (See 
Chapter 4) 

■ assessment: The piocess of estimating, for 
example, student attainment of learning objec- 
tives. A variety of procedures, including tests, 
can be used in the estimation. (See Chapter 1) 

■ comprehensive assessment system: A planned 
program for monitoring the achievement of an or- 
ganization's students. The program is likely to 
specify at least: a schedule that indicates the 
grade levels of students who will bo assessed, the 
subject areas, and the time (month) of adminis- 
tration; the assessment procedures that will be 
used; and the plans for processing, interpreting, 
and using the results. (See Chapter 1) 

■ content validity: The extent to '^^hich a test 
matches the content of a given program. 

■ correlation coefficient: A measure of the 
degree of relationship between two sets of mea- 
sures for the same group of individuals. Correla- 
tion coefficients range from 0.00, indicating a 
complete absence of relationship, to i 1.00 and 
•l.CO, indicating a perfect positive or negative 
correspondence. 

■ customized test: A test that is designed specifi- 
cally for a particular school or district. The test 
may have been developed by a publisher who 
maintains a collection of learning objectives and 
matching assessment items. The test is developed 
after the district indicates which objectives to 
assess. 

M criterion-referenced test (CRT): A test that is 
designed to provide information on the specific 
knowledge or skills of a student. The scores on a 
criterion-referenced test have meaning in rela- 
tion to what the student knows or is able to do. 

■ domain-referenced test (DRT): Similar to a 
criterion-referenced test. It is designed to provide 
information on the extent of student learning in 
a specific content domain. 



■ educational signiHcance: A judgment that 
test performance, or the differences in test perfor- 
mance of groups, is meaningful or important in 
practical terms. This term is often contrasted 
with statistical significance (See below). 

fl empirical data: Data collected through observa- 
tion or experience. The test scores of local stu- 
dents, for example, are empirical data. 

n empirical norm dates: The actual dates on 
which a publisher tested the students in the 
norm group. Publishers recommend that .schools 
administer tests on these dates. Testing at other 
times may mean that students have received 
more or less instruction than the norm group. 

B error of measurement: A statistical estimate 
of the difference between an observed score and 
the corresponding "true" score (the score that 
would be obtained if the assessment were perfect- 
ly reliable). 

F through N 

B grade equivalent score (GE): The grade level 
for which a given score is the estimated average. 

m item analysis: The process of evaluating indi- 
vidual test items with respect to certain charac- 
teristics. Item analysis involves determining 
such factors as the difficulty level and discrim- 
inating power of an item. All such characteristics 
are then used to judge the overall quality of the 
item. 

a item banks: Collections of assessment items. 
Generally, these are used for constructing tosts 
that measure selected learning objectives. With 
sufficient numbers of items, multiple test forms 
that assess the same objectives can be construct- 
ed. (See Chapter 6) 

a item-difficulty level: (See p value") 

B normal curve equivalent (NCE): A measure- 
ment scale developed for Title I (Chapter 1) evalu- 
ation requirements. The scale ranges from 1 to 
99, with units equal in size across the score 
range. The equivalence of units makes it more 
possible to average scores across groups and to ag- 
gregate results across tesbi. 

a norm group: The samnle of studei^ts that was 
given a test in order to estimate how well the stu- 
dent population in general would perform on the 
measure. A norm group should be as representa- 
tlve as possible of the variation expected within 



^ ^ Adapted from Alaska Assessment Handbook. 3 

1^ So 



the general population. Key dimensions to be rep- 
resented in a norm group incl'ide ethnicity, soci* 
oeconomic status, size of school system, location 
of system (urban, rural, or suburban), public vs. 
non-public schools, and geographic region of the 
country. 

B norm-referenced test (NRT): A test that is de- 
signed to provide information on how well a stu- 
dent performs m comparison to other students. 
The scores on a norm-referenced test have mean- 
ing in terms of their relation to the scores of an 
external reference group (the norm group). 

R norms tables: Tables presented in test manuals 
or available from test publishers that show the 
relationship of different types of scores to one 
another (e.g., raw scores to percentiles). Tables 
are usually provided for each test level and time 
of testing (norms dates) as well as by grade level 
of the students tested. 

O through Z 

B out-of-level testing: Administering a test at a 
level below or above the one generally recom- 
mended for a student based on his or her grade 
level. Such testing is done to accommodate the 
ability levels of students who are either much 
above or much below the average of students 
their age and thus would not be able to demon 
strate the knowledge and skills they possess 

■ p value: An index which signifies the percent of 
examinees who answered a test item correctly. 

B percentile rank: An indication of a student's 
standing in comparison with all students in the 
norm group who took the same test. Percentile 
ranks range from a low of 1 to a high of 99. A per- 
centile rank indicates the percent of students 
who obtained scores equal to or less than a given 
score. 

B pilot testing: TVying out a test or item with a 
small number of students ta see if it works before 
giving it to a large group of students. 

a point biserial correlation coefficient: An esti- 
mate which is often used to determine wh'ether a 
test item ic functioning well by comparing indi- 
vidual students' performance on the item to thoir 
performance on the test as a whole. (Are student? 
with high total scores more likely to answer a 
question correctly than other students?) Point 
biserial correlations are between one dichotomous 
va',iable (such as whether or not individuals stu- 
dents responded to an item correctly) and one con- 
tinuous variable (e.g., students' total scores on 
the test). The range of the coefficients is restrict- 



ed; generally, correlations above .30 are 
acceptable. 

a raw score: The number of test items answered 
correctly by a student. Because different tests 
have different numbers of items, raw scores 
cannot be compared from one test to another. 

B reliability: The extent to which a test can be 
depended upon to provide consistent, error free 
information. Reliability is usually reported as a 
correlation coefficient, with the closer the coeffi- 
cient is to + 1.00, the higher the reliability. TVpes 
of reliability commonly reported for tests include 
test retest, alternate forms, split half and Kuder- 
Richardson KR 20. (See Chapter 7) 

B scale score: A score that expresses the results of 
a particular test for all forms and levels on a 
single common scale Scale scores allow compari- 
sons from grade to grade or level to level of a test. 

B standard s-^ore: A general term referring to 
any of several types of "transformed" scores. 
Scores are expressed in terms of standard scores 
for reasons of convenience, comparability and 
ease of interpretation. For example, the raw 
scores from two different tests can be expressed 
in more comparable terms by using standard 
scores. 

B Standardized (or uniform) assessment proce- 
dure: A clearly described assessment procedure 
(for example, a test) with administration direc- 
tions which were developed so that everyone will 
administer the procedure in the same way. Stu- 
dent performance will not vary because, for exam- 
ple, administrators give different directions or 
allow differing lengths of time. (See Chapters 3 
and 8) 

8 stanine: A standard score scale that ranges 
from a low of 1 to a high of 9, with a specified per- 
centage of cases falling into each category. 

B statistical signiHcance: A judgment, based on 
the application of statistical calculation, that a 
certain test score or the difference in scores be- 
tween separate groups are "really" different4hat 
IS, not just apparently different because of 
chance fluctuations. While statistical signifi.- 
cance gives the appearance of scientific truth, it 
must be understood that rssults of statistical 
analyses are very dependent on the number of 
students tested. The smaller the number of scores 
analyzed, the bigger the difference required for it 
to be statistically significant. For this reason, 
many persons talk about both statistical and 
educational significance when referring to test 
scores. 
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a test speciHcations: Descriptions of the distribu- 
tion of items for a test. These distributions are 
frequently used tluring test construction to speci- 
fy the number or percent of items that assess vari- 
ous content categories. (See Chapter 6) 

■ testwiseness; The possession of skills indepen- 
dent of subject-matter knowledge that make it 
possible for students to achieve higher test 
scores. Such skills can be taught and will result 
in small but consistent improvement in test 
scores. (See Chapter 8) 

■ uniform rating scale; Standards and criteria 
for assigning value to, for example, student per- 
formance. The purpose of a uniform rating scale 
is to ensure that values will be consistent across 
raters. A student's rating would he the same 
regardless of who did the rating. (See Chapter 3) 

■ validity: The characteristic of a test or other as- 
sessment procedure that refers to whether the 
items in an instrument can provide information 
that is needed for a particular purpose. Most 
instruments have multiple validities, one for 
each likely use of the resulting data. (See Chapter 
7) 
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Appendix B 



IMPROVING YOUR TEST QUESTION 



Johi^ C* Ory 
Measurement and Evaluation Division 
Office of Instructional 
and Management Servicers 
University of Illinois 
at Urbana-Champaign 



May be quoted in whole or in part if credit is given the source. 
O 7 

KLC 

68 



Improving Your Test Questions 



Choosing the appropriate type of test item to measure students' understanding of course material and 
their achievement of course goals can often be as difficult a task as writing the items themselves. The 
purpose of this booklet is (a) to inform you of the uses, advantages and limitations of the various item 
types and (b) to help you develop specific skills in writing each kind of item. 

The booklet is divided into the following sections: 

Page 

I. Choosing Between Objective and Subjective Tes^ Items 11 

II. Suggestions for Using and Writing Test Items 15 

Multiple-Cnoice 15 

True-False 19 

Matching 21 

Completion 24 

Essay 26 

Problem Solving 30 

Performance 32 

III. Checklist for Evaluating Test Items 34 

IV. References for Further Reading 37 



L Choosing Between Objective 
and Subjective Test Items 



There are two general categories of test items, (a) objective items which require students tj select the 
correct response from several alternatives or to supply a word or short phrase to answer a qiiestion or 
complete a statement, and (2) subjective or essay items whi^h permit the student to organize and present 
an original answer. Objective items include multiple choice, true false, matching and completion, while 
subjective items includ:: short ans'^^er essay, extended response essa>, problem solving and performance 
test items. For some instructional purposes one or the other ilcn* types may prove more efficient and ap- 
propriate. To begin our discufc>3ion of the relative merits of each type of test item, test your kno\y ledge of 
these two item types by answering the following questions. 

Test Item Quiz 

(circle the correct answer) 

1. Essay exams are easier to construct than are objective exams. T F ? 

2. Essay exams require more thorough student preparation and study 

time than objective exams. T F ? 

3. Essay exams require writing skills where objective exams do not. T F ? 

4. Essay exams teath a person how to write. T F ? 

5. Essay exams are more subjective in nature than are objective exams. T F ? 

6. Objective exanr s encourage guessing more so than essay exams. T F ? 

7. Essay exams limit the extent of content covered. T F ? 

8. Essay and objective exams can be used to measure the same content 

or ability. T F ? 

9. Essay and objective exams are both goo^ ways to evaluate a student's 

lev<3l of knowledge. T F ? 

(Quiz answers on following page.) 
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Quiz Answers 



1. TRUE — Essay items are generally easier and less time consuming to construct than are most ob 

jective test items. Technically correct and content appropriate multiple choice and true- 
false test items require an extensJ.ve amount of time to write and revise. For example, a 
professional item writer produces only 9- 10 good multiple-choice items in a day's time. 

2. ? — According to research findings it is still undetermined whether or not essay tests require 

or facilitate more thorough (or even different) student study preparation. 

3. TRUE — Writing skills do affect a student's ability to communicate the correct "factu^. informa- 

tion through an essay response. Consequently, students with good writing skills have an 
advantage over students who have difficulty expressing themselves through writing. 

4. FALSE — Essays do not teach a student how to write but they can emphasize the importance of 

being able to comm'^nicate through writing. Constant uoe of essay tests may encourage 
the knowledgeable but poor writing student to Improve his/her writing ability in order 
to improve performance. 

5. TRUE — Essays are more subjective in nature due to their susceptibility to scoring influences. 

Different readers can rate identical responses differently, the same reader can rate the 
same paper differently over time, the handwriting, neatness or punctuation can uninten 
tionally affect a paper's grade and the lack of anonymity can affect the grading process. 
While impossible to eliminate, scoring influences or biases can be minimized through 
procedures discussed later in this booklet. 

6. ? — Both item types encourage seme form of gaessing. Multiple-choice, true false and match- 

ing items can be correctly answered through blind guessing, yet essay items can be re- 
sponded to satisfactorily through well *vritten bluffing. 

7. TRUE — Due to the extent of time required by the student to respond to an essay question, only a 

few essay questions can be included on a classroom exam. Consequently, a larger 
number of objective items can be tested in the same amount of time, thus enabling the 
test to cover more content. 

8. TRUE Both item types can measv're similar content or learning objectives. Research has shown 

that students respond almost identically to essay and objective test items covering the 
same content. Studies^ by Sax & Collet (1968) and Paterson (1926) conducted forty-two 
years apart reached the same conclusion: 

. . there seems to be no escape from the conclusions that the two types of exams 
are measuring identical things." (Paterson, p. 246) 

This conclusion should not be surprising; afterall, a well written essay item requires 
that the student (1) have a store of knowledge, (2) be able to relate facts anvl principles, 
and (3) be able to organize sucii information Into a coherent ai.i logical written expres- 
sion, whereas an objective test item requires that the stident (1) have a store of knowl- 
edge, (2) be able to relate facts and principles, and (3) be able .o organize such informa- 
tion into a coherent and logical choice among several alternatives. 

9. TRUE — Both objective and essay test items are good ievices for measuring student achievement. 

However, as seen in the previous quiz answers, there are particular measurement situa- 
tions where one item type is more appropriate than the other. Following is a set of recom 
mendations for using either objective or essay tee* Items. (Adapted from Robert L. Ebel, 
Essentials of Educational Measurement, 1972, p. 144). 



Gilbert Sax and Le Verne S. Collet, "An Empirical Comparison of the Effects of Recall and Multiple Choice Testa on 
Student Achievement,** Journal of Educational Measurement, vol.5 (1968), 169-73. 

Donald G. Paterson, "Do New and Old TVpe Examinations Measure Different Mental Functions?* School andL^cie 
ty, vol. 24 (August, 21, 1926), 246-48. 
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When to Use Essay or Objective Tests 

Essay tests are especially appropriate when: 

— the group to be tested is small and the test is not to be reused. 

— you wish to encourage and reward the development of student skill in writing. 

— you are more interested in exploring the student's attitudes than in measuring his^Tier 
achievement. 

— you are more confident of your ability as a critical and fair reader than as an imaginative writer 
of good objective test items. 

Objective tests are especially appropriate when: 

— tne group to be tested is large and the test may be reused. 

— highly reliable test scores must be obtained as efficiently as possible. 

— impartiality of evaluation, absolute fairness, and freedom fr^m possible test ^coring influences 
(e.g., fatigue, lack of anonymity) are essential. 

— you are more confident of your ability to -jxpress objective test items clearly than of your ability 
to judge essay test answers correctly. 

— there is more pressure for speedy reporting of scores than for speedy test preparation. 
Either essay or objective tests can be used to: 

^ measure almost any important educational achievement a v/ritten test can measure. 

— test understanding and ability to apply principles. 

— test ability to think critically. 

— test ability to solve problems. 

— test ability to select relevant facts and principles and to integrate them toward the solution of 
complex problems. 

In addition to the preceding suggestions, it is important to realize that certain item types are better 
suited than ethers for measuring particular learning objectives. For example, learning objectives requir 
ing the student to demonstrate or to show, may be better measured hy performance test items, w hernias 
objectives requiring the student to explain or to describe may be better measured b> essa> ^est items. The 
matching of learning objectivs expectations with certain item types can help you select an appropriate 
kind of test item for your classroom exam as well as provide a higher degree of test validity (i.e., testing 
what is supposed to be tested). To further illu.itrate, several sample learning objectives and appro pi late 
test items are provided on the following page. 
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Learning Objective 



Most Suitable Test Item 



The student will be able to categorize 
and name the parts of the human 
skeletal system. 



Objective Test Item 
(M-C,T-F, Matching) 



The student will be able to critique 
and appraise another student's En- 
glish composition on the basis of its 
organization. 



Essay Test Item 
(Extended-Response) 



The student will demonstrate safe 
laboratory skills. 



Performance Test Item 



The student will be able to cite four 
examples of satire that Twain uses in 
Huckleberry Finn. 



Essay Test Item 
(Short-Answer) 



After you have decided to use either an objective, essay or both objective and essay exam, the next sttp 
h to select the kind(s) of objective or essay item that you wish to include on the exam. To help you make 
such a choice, the different IJnds of objective and essay items are presented in the following section of 
this booklet. The various kinds of items are briefly described and compared to one another in terms of 
their advantages and limitations for use. Also presented icj a set of general suggestions for the construc- 
tion of each item variation. 
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II. Suggestions for Using and Writing Test Items 



Multiple-Choice Test Items 

The multiple-choice item consists of two parts, (a) the stem, which identifies the question or problem 
and (b) the response alternatives. Students are asked to select the one alternative that best completes 
the statement or answers the question. For example, 

Sample Multiple-Choice Item 

(a) Item Stem: Which ofthe following is a chemical change? 

(b) Response Alternatives: a. Evaporation of alcohol 

b. Freezing of water 
*c. Burning of oil 
d. Melting of wax 

*correct response 

Advantages in Using Multiple-Choice Items 
Multiple-choice items can provide 

— versatility in measuring all levels of cognitive ability. 

— highly reliable test scores. 

— scoring efficiency and accuracy. 

— objective measurement of student achievement; or ability. 

— a wide sampling of content or objectives. 

— a reduced guessing factor when compared to true-ialse items. 

— different response alternatives which can provide diagnostic feedback. 

Limitations in Using MuItipIe*Choice Items 
Mtiltiple^choice items 

— are difficult and time consuming to construct. 

— lead an instructor to favor simple recall of facts. 

— place a high degree of dependence on the student's reading ability and Instructor's writing 
ability. 

Suggestions for Writing Multiple-Choice Test Items 

THE STEM 

1. When possible, Hiate the stem as a direct question rather than as an incomplete statement. 

Undesirable: Alloys are ordinarily produced by... 
Desirable: How are alloys ordmarily produced 7 
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2. Present a definite, explicit and singular question or problem in the stem. 

Undesirable: Psychology,,. 

Desirable: The science of mind and behavior is called . . . 

3. Eliminate excessive verbiage or irrelevant information from the stem. 

Undesirable: While ironing her formal, Jane burned her hand accidentally on the hot iron. This 
was "^ue to a transfer of heat by... 

Desirable: Which of the following ways of heat transfer explains why Janes }uind was burned 
after she touched a hot iron ? 

4. Include in the stem any word(s) that might otherwise be repeated in each alternative. 

Undesirable: In national elections in the UnitedStates the President is officially 

a chosen by the people. 

b. chosenby members of Congress. 

c. chosen by the House of Representatives. 
*d. chosen by the Electoral College. 

Desirable: In national electk is in the United States the President is officially chosen by 

a. the people. 

b. members of Congress. 

c. the House of Representatives. 
*d. the Electoral College. 

5. Use negatively stated stems sparingly. When used, underline and/or capitalize the negative word. 

Undesirable : Which of the following is'^ot cited as an accomplishment of the Kennedy administra- 
tion? 

Desirable: Which of the following is NOTcited as an accomplishment of the Kennedy admin- 
istration? 

ITEM ALTERNATIVES 

6 Make all alternatives plausible and attractive to the less knowledgeable or skillful student. 

Vi kat process is most nearly the opposite of photosynthesis ? 

Uxidesirable Desirable 

a. Digestion a. Digestion 

b. Relaxation b. Assimilation 
*c. Respiration *c. Respiration 

d. Exertion d. Catabolism 
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7- Make the alternatives grammatically parallel with each c ther, and consistent with the stem. 

Undesirable: What would do most to advance the application of atomic discoveries to medicine ? 

*a. Standardized techniques for treatment of patients. 
6. 2>am the average doctor to apply radioactive treatments. 

c. Remove the restriction on the use of radioactive substances. 

d. Establishing hospitals staffed by highly trained radioactive therapy spe- 
cialists. 

Desirable: What would do most to advance the application of atomic discoveries to medicine ? 

*a. Development of standardized techniques for treatment of patients. 

b. Training of the average doctor in application of radioactive treatments, 

c. Removal of restriction on the use of radioactive substances. 

d. Addition of trained radioactive therapy specialists to hospital staffs. 

8. Make the alternatives mutually exclusive. 

Undesirable: The daily minimum required amount of milk that a lO-yearold child should drink 
is 

a 1-2 glasses. 
*b. 2-3 glasses. 
*c. 3-4 glasses. 

d. at least 4 glasses. 

Desirable: What is the daily minimum required amount of milk a 10-year old child should 
drink? 

a 1 glass 

b. 2 glasses 
*c. 3 glasses 

d. 4 glasses 

9. When possible, present alternatives in some logical order (e.g., chi inological, most to least, 
alphabetical). 

At 7 a.m., two trucks leave i diner and travel north. One truck averages 42 miles per hour and 
the other truck averages 38 miles per hour. At what time uill they be 24 miles apart ? 

Undesiixible Desirable 

a 6p.m. a. 1 a.m. 

b. 9p.m. b. 6 a.m. 

c. 1a.m. c. 9 a.m. 
*d. 1p.m. *d. 1p.m. 

e. 6a.m. e. 6p.m. 

10. If you have decided to use a tra Jitional single correct answer format, be sure the :e is only one correct 
or best answer. 

Undesirable : The two most desired characteristics in a classroom test are validity and 

a. precision. 
*b. reliability. 

c. objectivity. 
*d. consistency. 
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Desirable: The two most desired characteristics in a classroom test are validity and 



a. precision. 
*b. reliability. 

c. objectivity. 

d. standardization. 

11. Make alternatives approximately equal in length. 

Undesirable: The most general cause of low individual incomes in the UnitedStates is 

*a. lack of valuable productive services to sell. 

b. unwillingness to work. 

c. automation. 

d. inflation. 

Desirable: What is the most general cause of low individual incomes in the United States? 

*a. A IcLck of valuable productive services to sell. 

b. The population *s overall unwillingness to work. 

c. The nation <s increased reliance on automation. 

d. An increasing national level of inflation. 

12. Avoiu irrelevant Ciues such as grammatical structure, wsU known verbal associations or connections 
between stem and answer. 

Undesirable: A chain of islands is called an: 
(grammatical 

clue) *a. archipelago. 

b. peninsula. 

c. continent. 

d. isthmus. 

Undesirable: The reliability of a test can be estimated by a coefficient of: 
(verbal asso- 
ciation clue) a. measurement. 

*b. correlation. 

c. testing. 

d. error. 

Undesirable: The(^eigM)to which a water dam is built depends on 
(connection 

between stem a. tn^^ngth of the reservoir behind the dam. 

and answer 6. the vbluQie of water behind the dam. 

clue) *c. thei^ejgM^f water behind the dam. 

d. the strength of the reinforcing wall. 

13. Use at least four alternatives for each item to lower the probability of getting the item correct by 
guessing. 

14. Randomly distribute the correct response among the alternative positions throughout the test 
having approximately the same proportion of alternatives a, b, c, d and e as the correct response. 



15. Use the alternatives "none of the above" and "all of the above" sparingly. When used, such alterna- 
tives should occasionally be used as the correct response. 



True-False Test Items 



A true false item can be writte.i :n one of three forms, simple, complex, or compound. Answers can con- 
sist of only two choices (simple), more than two choices (complex), or two choices plus a conditional com- 
pletion response (compound). An example of each type of true-false item follows: 

Sample True -False Item: Simple 

The acquisition of morality ij a developmental process, lYue False 

Sample True-False Item: Complex 

The acquisition of morality is a developmental process. True False Opinion 

Sample True-False Item: Compound 
The acquisition of morality is a developmental process. '^ue False 

If this statement is false, what makes it false ? 



Advantages in Using True-False Items 
TVue-false items can provide 

— the widest sampling of content or objectives per unit of testing time. 

— scoring efficiency and accuracy, 

— versatility in measuring all levels of cognitive ability, 

— highly reliable test scores. 

— an objective measurement of student achievement or ability. 

Limitations r ising True-False Items 
True-false items 



— incorporate an extremely high guessing factor. For simple true false items, each student has a 
50/50 chance of correctly answering the item without any knowledge of the item s concent. 

— can often lead an instructor to write ambiguous statements due to the difficulty of writing state- 
ments which are unequivocally true or false, 

— do not discriminate between students of varying ability as well as other item types, 

— can often include more irrelevant clues than do other item types. 

— can often lead an instructor to favor testing of trivial knowledge. 
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Suggestions for Writing True-False Test Items 



1. Base true-false items upon statements that are absolutely true or false, witl^out qualifications or 
exceptions. 

Undesirable: Nearsightedness is hereditary in origin. 

Desirable: Geneticists and eye specialists believe that the predisposition to nearsightedness is 
hereditary. 

2. Express the item statement as simply and as clearly as possible. 

Undesirable: When you see a highway with a marker that reads, "Interstate SO'^you know that the 
construction and upkeep of that road is built and maintained by the state and feder- 
al government. 

Desirable: The construction and maintenance of interstate highways is provided by both state 
and federal governments, 

3. Express a single idea in each test item. 

Undesirable: Water will boil at a higher temperature if the atmospheric pressure on its surface is 
increased and more heat is applied to the container. 

Desirable: Water will boil at a higher temperature if the atraospheric pressure on its surface is 
increased. 



4. Include enough background information and qualifications so that the ability to respond correctly 
to the item does not depend on some special, uncommon knowledge. 

Undesirable: The second principle of education is that the individual gathers knowledge. 

Desirable: According to John Dewey, the second principle of education is that the individual 
gathers knowledge, 

5. Avoid lifting statements from the text, lecture or other materials so that memor> alone will not 
permit a correct answer. 

Undesirable: For every action there is an opposite and equal reaction. 

Desirable: If you were to stand in a canoe and throw a life jacket forward ^o another canoe, 
chances are your canoe would jerk backward, 

6. Avoid using negatively stated item statements. 

Undesirable: The Supreme Court is not composed of nine justices. 
Desirable : The Supreme Court is composed of nine justices. 



and/or 



Water will boil at a higher temperature if more heat is applied to the container. 
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7. Avoid the use of unfamiliar vocabulary. 

Undesirable: According to some politicians, the raison detre for capital punishment is retribution. 

Desirable: According to some politicians, justification for the existence of capital punishment is 
retribution* 

8. Avoid the use of specific determiners which would permit a test wise but unprepared examinee to re 
spond correctly. Specific determiners refer to sweeping terms like "all," "always," "none," "never," 
"i^npossible,*" "inevitable," etc. Statements including. such terms are likely to be false. On the other 
hand, statements using qualifying determiners such as "usually," "sometimes," "often," etc.,^are 
likely to be true. When statements do require the use of specific determiners, make sure they appear 
in both true and false items. 

Undesirable :(^W^cssions of Congress are called by the President (F) 

The Supreme court is (frequently) required to rule on the constitutionality of a law, (T) 



An objective test is (generallyyasier to score than an essay test (T) 
Desirable: (When specific determiners are used reverse w.e expected outcomes.) 
The sum of the angles of a triangle is(alwaysj280^, (T) 



Each molecule of a given compound is chemically the same as ( ^ver^ other molecule 
of that compound, (T) 



The galvanometer is the instrument(usually)used for the metering of electrical 
energy used in a home, (F) 

9. False items tend to discriminate more highly f *ian true items. Therefore, use more false items than 
true items (but no more than 15% additional false items). 



Matching Test Items 



In general, matching items consist of a column of s^imu/i presented on the left &ide of the exam page and 
a column of responses placed on the right side of the page. Students are required to match the response 
associated with a given stimulus. For example. 

Sample Matching Test Item 

Directions, On the line to the left of each factual statement, write the letter of the principle which 6to< ex^ 
plains the statement 's occurence. Each principle may be used more than once* 



Factual Statements 

i. Fossils of primates first appear in the Ceno- 

zoic rock strata, while trilobite remains are 
found in the Protozoic rocks, 

2, The Artie and Antarctic regions are 'sparse- 

ly populated, 

3, Plants har^. no nervous system, 

4, Large coal beds exist in A laska. 



Principles 

a. There have been profound changes in the cli- 
mate on earth, 

b. Coordination and. integration of action is 
generally slower in plants than in animals, 

c. There is an increasing complexity of struc- 
ture and functions from lower and higher 
forms of life, 

d. All life comes from life and produces its own 
kind of living organism, 

e. Light is a limiting factor to life. 



21 



80 



Advantages in Using Matching Items 
Matching items 

require short periods of reading and response time, allowing you to cover more content. 

— provide objective measurement of student achievement or ability. 

— provide highly reliable test scores. 

— . provide scoring efficiency and accuracy. 

Limitations in Using Matching Items 

Matching items 

— have difficulty measuring learning objectives requiring more than simple recall of information. 

— are difficult to construct due to the ^ir^Liem of selecting a common set of stimuli and responses. 

Suggestions for Writing Matching Test Items 

1. Include directions which clearly state the basis for matching the stimuli with the Aesponses. Explain 
whether or not a response can be used more than once and indicate where to write the am.wer. 

Undesirable: Directions: Match the following. 

Desirable: Directions On the line to the left of each identifying location and characteristics in 

Column I, write the letter of the country in Column II that is best 
defined. Each country in Column II may be used more than once, 

2. Use only homogeneous material in matching items. 

Undesirable: Directions: Match the following. 



1. 


Water 


A. 


NaCl 


2. 


Discovered Radium 


B. 


Fermi 


3. 


Salt 


a 


NH3 


4. 


Year of the 1st Nuclear Fission by Man 


D. 


H2O 


5. 


Ammonia 


E. 


1942 






F. 


Curie 



Desirable: Directions. On the line to the left of each compound in Column I, write the letter of 

the compound's formula presented in Column II, Use each formula 
only once. 

Column I Column II 

1 Water A. H2SO4 

2, Salt B, HCl 

3, Ammonia C, NaCl 

4, Sulfuric Acid D, H^O 

E, H^Cl 



3. Arrange the list of responses in some syste.iiatic order if possible (e.g., cbionological, alphabetical). 

Directions: On the line tht l:h o/'eixch definition in Column I, write the tetter of the defense 
mechanism in C^*^- ^hat is described. Use each defense mechanism only once. 



Column I 
1. 

2. 



Undesirable Desirable 
Column II 



-3. 



.4. 



Hunting for reasons io ^upwrt 
one's beliefs. 

Accepting the Values ami norms 
of others as one*s own even when 
they are contrary io previously 
held values. 

Attributing to others one*s own 
unacceptable impulses, thoughts 
and desires. 

Ignoring disagreeablz situations, 
topics sights. 



(u Rationalization 

b. Identification 

c. Projection 

d. Inirojection 

e. Denial of reality 



a. Denial of realHy 

b. IdentificatiOii 

c. Introjection 

d. Projection 

e. Rationalization 



4. Avoid grammatical or other clues to the correct response. 

Undesirable: Directions: Match iL. following in order to complete the sentences on the left. 



1. Igneous roc^-^ are formed 

2. The formation of coal requires 

3. A geode is filled 

4. Feldspar is classified as 



A. a liardness of 7. 

B. with crystalline rock. 

C. a metamorphic rock. 

D. Jieat and pressure. 

E. through the solidification of 



molten lava. 

Desirable: Avoid sentence completion due to gn^rammatical clues. 

5. Keep matching items brief, limiting the list of stimuli to under 10. 

6. Include more responses than stimuli to help prevent answering through the process of elimination. 

7. When possible, reduce the amount of reading time by including only sho: t phrases or single words in 
the response list. 
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Completion Test Items 



The completion item requires the student to answer a question or to finibh an incomplete st tement by 
filling in a blank with the correct word or phrase. For example, 

Sample Completion Item 

According to Freud, personality is made up of three major systems, the , the 

and the 

Advantages in Using Completion Items 

Concpletion items 

— can provide a wide sampling of content. 

— can efficiently measure lower levels of cognit* i ability. 

— can minimize guessing as cc^npared to multiple-choice or true-false items. 

— can usually provide an objective measure of student acnievenient or ability. 

Limitations in Using Completion Items 
Completion items 

— are difficult to construv^t so that the desired response is clearly indicated. 

— have difficulty measuring learning objecti^res requiring more thar* simple recall of information. 

— can often include more irrelevant clues than do other item types. 

— are more time consuming to score when compared to multiple-choice or true false items. 

— are more difficult to score since more than one answer may havvi to be considered corroct if the 
item was not properly prepared. 
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Suggestions for Writing Completion Test Items 

1. Omit only significant words from the statement. 

Undesirable: Every atom has a central (core) called a nu jus. 
\ ^sirable: Every atom has a central core called a (n) (nucleus) . 

2. Do not omit so many words from the statement that the intended meaning is lost. 

Undesirable: The were to Egypt as the were to Persia and as 

were to the early tribes of Israel 

Desirable: The Pharaohs were to Egypt as the were to Persia and as 

were to the early tribes of Israel. 

3. Avoid grammatical or other clues to the correct response. 

Undesirable : Most of the United States ' libraries are organized according to the (Dewev) deci- 
mal system. 

Desirable: Which organizational system is used by most of the United States libraries ? 
(Dew^y de cimal) 

4. Be sure there is only one correct response. 

Undesirable : TYees which shed their leaves annually are (seed-hearing, common) . 
Desirable: TYees which shed their leaves annually are called (deciduous) . 

5. Make the blanks of equal length. 

Undesirable: In Greek mythology, Vulcan was the son o f (Jupiter) and (Juno) . 
Desirable : In Greek mythology, Vu lean was the son o f (Jupiter) and (Juno) . 

6. When possible, delete words ac the end of the statement after the student Lits been presented a clear- 
ly defined problem. 

Undesirable: (122.5) is the molecuLir weight of KCIO3 

Desirable: The molecular weight ofKClO^ u: (122.5) . 

3 

7. A^ oid lifting statements directly from the text, lecture or other sources. 

8. Limit the required response to a single word or phrase. 
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Essay Test Items 



The essay test is probably the most popular of all types of teacher made tests. In gener*il, a classroom 
essay test consists of a small number of questions to which the student is expected to demonstrate 
his/her ability to va) recall factual knowledge, (b) organize thl.s knowledge and (c) pn .ent the knowledge 
in a logici integrated ai. >wer to tLe question. An essay test item can be clasbified a > either an extended 
response essay item or a short-answer essay item. The latter calls for a more lestricted or limited 
answer in terms of form or scope. An example of each type of essay item follows. 

Sample Extended-Response Essay Item. 

Explain the difference between S-R (Stimulus-Response) and the S-O-R (Stimulus- 
Organism -Response) theories of personality. Include in your answer (a) brief descriptions of both 
theories, (b) supporters of both theories and (c) research methods used to study euch of the two theo- 
ries, (10 pts, 20 minutes) 

Sample Short-Answer Essay Item 

Identify research methods used to study the S-R (stimulus -response) and S-O-R (Stimulus- 
Response-Organism) theories of personality (5 pts, 10 minutes) 

Advantages in Using Essay Items 
Essay items 

— are easier and less time consuming to construct than are most other item types. 

— provide a means for testing student ability to compose an answer and present it in a logical 
manner. 

— can efficiently measure higher order cognitive objectives (e.g., analysis, synthesis, evaluation). 

Limitations in Using Essay Items 
Essay item? 

— cannot measure a large amount of content or objectives. 

— generally provide low test and test scorer reliability. 

— require an extensive amount of instructor's time to read and grade. 

— generally do not provide an objective measure of stLdent achieveme* .t or abilitv (oubject to bias 
on the par c of the grader) . 
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Suggestions for Writing Essay Test Items 

1. Prepare essay items that elicit the type of bshavior you want to me^^sure. 

Learning 

Objective: The student will be able to explain how the normu^ curve serves as a statistical 
model 

Undesirable: Describe a normal curve in terms of, symmetry, modality, kurtosis and skewness. 

Desirable: Briefly explain hov) the normal curve serves as a statistical model for est^^.iation and 
hypothesis testing. 

2. Phrase each item so that the student's task is clearly indicated. 

Undesirable: Discuss the economic factors which led to the stock market crash of 1929, 

Desirable: Identify the three major economic conditions which led to the stock market crash of 
1929. Discuss briefly each condition in correct chronological seqvmce and in one 
paragraph indicate how the three factors were interrelated, 

3. Indicate for each item a point value or weight and an estimated time limit for answering. 

Undesirable: Compare the writings of Bret Harte and Mark Tluain In terms ofsettu irs, depth of 
characterization, and dialogue styles of their main characters. 

Desirable: Compare the untieing of Br^i, Harte and Mark Tluain in terms of settings, depth of 
characterization and dialogue styles of their main characters, (10 points 20 minutes) 

4. Ask questions that will elicit re ypouoco on which experts could agree that one answer is better than 
another. 

5. At .id giving the student a choice among optional items as this greatly reduces the reliability of the 
test. 

6. It is generally recommended for classroom examinations to administer .several short answer 5ten.s 
rather than only one or two extended- response items. 
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Suggestions for Scoring Essay Items 



1. Choose a scoring model. Twc of t'le more common scoring models are ANALYTICAL SCORING and 
GLOBAL QUALITY. 

ANALYTICAL SCORINO: Each answer is compared to an ideal answer and points are assigned for 

the inclusion of nt sssary elements. Grades are based on the number of 
accumulated points either absolutely (i.e., A = 10 or more points, B=6-9 
pts., etc.) or relatively (A = top 15% scores, B=next 30% of scores, etc.) 

GDDBAL QUALITY: Each answer is read and assigned a score (e.g., grade, total points) based 

either on the total quality of the response or on the total quality of the 
response relative to other student answers. 

Example Essay Item and Grading Models 

""Americans are a mixed up people with no sense of ethical values. Everyone knows that baseball 
is far less necessary than food and steel, yet they pay ball players a lot more than farmers and 
steelworkers.'* 

WHY? Use 3-4 sentences to indicate how an economist would explain the above situation. 

Analytical Scoring 

Necessary Elements to be Included in Response Points 
Salaries are based on demand relative to supply 

of such services. 3 
Excellent ball players are rare. 2 
Ball clubs have a high demand for exc '^llent players. 2 
Clarity of Response. 2 



9 pts. 



Global Quality 

Assign scores or grades on the overall quality of the voritte:i response as compared to an idea! an^ujer Or, 
compare the overall quality of a response to other student rtoponses by sorting the papers into tht ee stacks. 

Below Average Average Above Average 

Read and sort each stack again and divide into three more stacks. 

Below Aver(^e Average^ Above Average 

BelowAvg. Avg. AboveAvg. BelowAvg. Aug. AboveAvg. BelowAvg. Avg. AboveAvg. 

In total, nine discriminations can be used to assign test g. odes in this manner. The number of stacks or dis- 
criminations can vary to meet your needs. 
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2. Try notto allow factors which are irrelevant to the learning outcomes 
being measured affect your grading (i.e., handwriting, spelling, neatness). 

3. Read and grade all class answers to one item before going on to the next item. 

4. Read and grade the answers without looking at the students* names to avuid posbible prL-.ereii.Jal 
treatment. 

5. Occasionally shuffle papers during the reading of answers to help avoid any systemdtic order effects 
(i.e., Sally's "B** work always followed Jim's "A" work thus It looked more like "C work). 

6. When possible, ask another instructor to read and grade your students* responses. 
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Problem Solving Test Items 



Another form of a subjective teat item is tl*e problem solving or computational exam question. Such 
items present the student ivith a problem sit*' ^tion or task and require a demonstration of work proce 
dures and a correcl .>olution, or just a correct solution. This kind of test item is classified as a subjective 
type of item due to the procedures used to score item responses. Instructors can assign full or partial 
credit to either corxact or incorrect solutions depending on the quality and kind of work procedures pre 
sented. An example of a problem solving test item follows. 



It was calculated tkat 75 men could complete a strip on a new highway in 70 days. When work was 
scheduled to commence, it was found necessary to send 25 men on another road project How many 
days longer will it take to complete the strip ? Show your work for full or partial credit 

Advantages in Using Problem Solving Items 

Problem solving items 

— minimize juessing by requiring the students tc provide an original response rather than to 
select from several alternatives. 

— are easier to construct than are mult' pie-choice or matching items, 

— can most appropriately measure learning objectives which focus on the ability to apply skills or 
knowledge in the solution of problems, 

— can measure an extensive amount of concent or objectives. 
Limitations in Using Problem Solving Items 

Problem solving items 

— generally provide low test and test scorer reliability, 

— require an extensive amount of instructor time to read and ^nrade, 

— generally do not provide an objective measure of student achievement or ability (subject to bias 
on the part of the grader when partial credit is given). 



Example Pvoblem Solving Test Item 
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Suggestions for Writing Problem Solving Test Items 
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1. Clearly ident^. ^ and explain the problem. 

Undesirable: During a car crash, the car slows down at the rate of 490 m/sec^. What is the magni- 
tude and direction of the force acting on a lOO-kg driver? 

Desirable: During a car crash, the car slows down at the rate of 490 m/sec^. Using the car as a 
frame of reference, what is the magnitude and direction of the gram force acting on a 
100' zig driver? 

2. Provide directions which clearly inform the student of the type of response called for. 

Undesirable: An American tourist in Paris finds that he weighs 70 kilograms. Wh^ n Ae left the 
United States he weighed 144 pounas. What was his net change in weight? 

Desirable: An American tourist in Park finds that he weighs 70 kilograms. When he left the 
United States he weighed 144 pounds. What was his r^i weight change in pounds ? 

3. State in the directions whether or not the student must show his/her work procedures for full or par- 
tial credit. 

Undesirable: A double concave lens is made of glass with n -= 1.50. If the radii of curvature of the 
two lens surfaces ar2 both 30.0 cm, what is the focal length of the lens ? 

Desirable: A double concave lens is made of glass with n - 1.50. If the radii of curvature of the 
two ler $ surfaces are both 30.0 cm, wiMt is the focal length of the lens? Show your 
work to receive full or partial credit. 

4. Clearly separate item parts and indicate their point values. 

A man leaves his home and drives to a convention at an average rate of 50 miles per hour. Upon ar- 
rival, he finds a telegram advising him to return at once. He catches a plant that lakes him back at 
an average rate of 300 miles p^r hour. 

Undesirable: If the total traveling time uas 1 3/4 hours, how long did It t ake him to fiy back ? How 
far from his home was the convention ? 

Desirable: If the total traveling time was 1 3/4 hours: 

(1) How long did it take him to fly back? (1..,) 

(2) How far from his home wcLS the convention? (Ipt.) 
Show your work for full or partial crediL 

5. Use figures, conditions and situations which create a realistic problem. 

Undesirable: An automobile weighing 2,840 N (about 640 pounds) is traveling at a speed of 300 
miles per hour. What is the car's kinetic energy? Show your work. (2 pts.) 

Desirable: An automobile weighing 14,200 N (about 3,200 pounds) is traveling at a ipeed of 
12m/sec. What is the car's kinetic energy? Show your work. (2 pts.) 

6. Ask questions that elicit responses on which experts could agree t/iat one solution and ont or more 
work procedures are better than others. 
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7. Work through each problem before classroom administration to double-check accuracy. 



Performance Test Items 



A performance test item is des^^^ied to assess the ability of a student to perform correctly in a simulated 
situation (i.e., a situation in which the student will be ultimately expected to apply his/her learning). 
The concept of simulation is cential in perfoimance testing, a performance test will simulate to some 
degree a real life situation to accomplish the a^sessmeut. In theory, a performance test could be con 
structed for any skill and real life situation. In practice, most performance tests have been developed for 
the assessment of vocational, managerial, administrative, leadership, communication, interpersonal 
and physical education skills in various simulated situatior*s. An illustrative example of a performance 
test item is provided below. 



Assume that some of the instructional objectives of an urban planning course include the development 
of the student's ability tf effectively use the principles covered in the course in various "real life** situa 
tions common for an urba*i planning professional. A performance test item could measure this dev^ilop 
ment by presenting the student with a specific situation which represents a "real life** situation. For 
example. 

An urban planning bourd makes a last minute request for the professional to act as consultant and 
critique a written proposal which is to be considered in a board meeting that very evening. The profes- 
sional arrives before t.z meeting and has one hour t analyze the written proposal and prepare his 
critique. The critique presentation is then made verbally during the board meeting, reactions of mem 
hers of the board or the audience include requests for explanation of specific points or informed at 
tacks on the positions taken by the professional 

The performance test designed to simvlate this situation would require that the student to be tested 
role play tiie professional s part, while students or faculty act the other roles in the situation. Various 
aspects of the ''professional s"* performance would then be obser ved and rated by several judges with 
the necessary background. The ratings could then be used both to provide the student with a diagr 
sis of his/her strengths and weaknesses and to contribute to a. overall summary evaluation of the stu 
dent's abilities. 

Advantages in Using Performance Test Items 
Performance test items 

— can most appropriately measure learning objectives which focus on the abilit> of the students to 
apply skills or knowledge in real life situations. 

— usually provide a degree of test validity not possible with standard paper e nd pencil test items. 

— are useful for measuring learning objectives in the psychomotor domain. 

Limitations in Using Performance Test Items 
Periormance test items 

— are difficult and time consuming to construct. 

— are primarily used for testing students individually and not for testing groui:« Consequently, 
they are relatively costly, time consuming, and inconvenient forms of testing. 



Sample Performance Test Item 
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— generally provide low test and test scorer reliability, 

— /generally do not provide an objective measure of student achievement or ability' (subject to bias 
on the part of the observer/grader). 

Suggestions for Writing Performance Tes^ Sterns 

1. Prepare items that elicit the type of behavior you want to measure, 

2. Clearly iden dfy and explain the simulated situation to the student, 

3. Make tht simulated situation as "life-like" as possible, 

4. Provide directions which clearly inform the students of the type of response called for, 

5. When appropriate, clearly state time a.id activity limitations in the directions, 

6. Adequately train the observer(s)/scorer(s) to ensure that they a^e fair in scoring the appropriate 
behaviors 
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Ill, Checklist for Evaluating Test Items 



EVALUATE YOUR TEST ITEMS BY CHECKING THE SUGGESTIONS WHICH YOU FEEL YOU 
HAVE FOLLOWED, 

Multiple-Choice Test Items 

When possible, stated the steia as a direct question rather than as an incomplete statement, 

Presented a definite, explicit and singular question or problem in the stem. 

Eliminated excessive verbiage or irrelevant information from the stem, 

Included in the stem any \/ord(s) that might have otherwise been repeated in each alternati^ e. 

Used negatively stated steins sparing)^. When used, underlined and/or capitalized the negative 

word(s), 

Made all alternatives plausible and attractive to the less knowledgeable oi skillful student. 

Made the alternatives grammatically parallel with each other, and consistent with the stem. 

Made the alternatives mutually exclusive, 

When possible, presented alternatives in some logical order (e,g,, chronologically, most to least). 

Made sure there was only one correct or best responoe per item, 

Made alternatives approximately equal in length, 

Avoided irrelevant clues such as grammatical structure, well known verbal association^ or con 

nections betw m stem and answer, 

Used at least four alt'^rnatives for each item, 

Randomly dist;ibuted the correct response among the alternative positions throughout the teat 

having approximately the same proportion of alternatives a, b, c, d, and e as the correct response. 

Used the alternatives "none of the above" and "all jf the above** sparingly. When used, such al- 
ternatives were occasionally the correct response. 

True-False Test Items 

Based true-fah items upon statements that are absolutely true or falf ^, without qualifications 

or exceptions. 

Expressed the item statement tis simply and as clearly as possible, 

Expressed a single idea in each test item. 



Included enough background information and qualifications so that the ability to respond cor 
rectly did not depend on some special, uncommon knowledge. 

Avoided lifting statements from the text, lecture or other materials. 

Avoided using nsgatively stated item statements. 

Avoided the use of unfamiliar language. 

Avoided the use of specific determiners such as "all," "always," "none," "iiever," etc., and qualify- 
ing determiners such as "usually," "sometimes," "often," etc. 

Used more false items than true items (but not more than 15% additional false items). 

Matching Test Items 

Included directions which clearly stated the basis for matching the stimuli with the response. 

Explained whether or not a response could be used more than once and indicated where to write 
the answer. 

Used only homogenaous material. 

When possible, arranged th* list ol responses in some systematic order (e.g., chronologically, 
alphabetically). 

Avoided grammatical or other clues to the correct response. 
Kept items brief (limited the list of stimuli to under 10). 
Included more responses than stimuli. 

When possible, reduced the amount of reading time b} including only short phrases or single 
word; in the response list. 

Completion Test Items 

Omitted only significant words from the statement. 

Did not omit so many words from the statement that the intended meaning was lost. 
Avoided grammatical or other clues to the correct response. 
Included only one correct response per item. 
Made the blanks of equal length. 

When possible, deleted the words at the end of the statement after the student was presented 
with a clearly defined problem. 

Avoided lifting sta^ ments directly from the text, lecture or other sources. 
Limited the required response to a single word or phrase. 



Essay Test Items 

Prepared items that elicited the type of behavior you wanted to measure. 

Phrasei' each item so that the student's task was clearly indicated. 

Indicated for each item a point value or weight and an estimated time limit for answering. 

Asked questions that elicited responses on which experts could agree that one answer is better 

than others. 

Avoided giving the student a choice among optior al items. 

Administered several short-answer items rather than 1 or 2 extended response items. 

Grading Essay Test Items 

Selected an appropriate grading model. 

Tried not to allow factors which were irrelevant to the learning outcomes being measured to 

affect your grading (e.g., handwriting, spelling, neatness). 

Read and graded all class answers to one item before going on to the next item. 

Read and graded the answers without looking at the student's name to avoid possible preferen- 
tial treatment. 

Occasionally shuffled papers during the reading of answers. 

When possible, asked another instructor to read and grade your students' responses. 

Problem Solving Test Items 

Clearly identified and explained the problem to the student. 

Provided directions which clearly informed the student of the type oi response culled for. 

Stated in the directions whether or **ot the student mu^ ^iiow work procedures for full or partial 

credit. 

Clearly separated item parts and indicated their point values. 

Used figures, conditions and situations % hich created a realistic problem. 

Asked luestions that elicited responses on which experts could agree that one solution und one 

or more work procedures are better than others. 

Worked through each problem before classroom administration. 

Performance Test Items 

Prepared items that elicit the type of behavior you wanted to measure. 

Clearly identified and explained the simulated situation to the student. 

Made the simulated situation as "life-like" as possible. 

Provided directions which clearly inform the students of the type of response called for. 

When appropriate, clearly stated time and activity limitations in tne directions. 

Adequately trained the observer(s)/scorer(s) to ensure that they were fair in scoring the ap- 

propriate behaviors. 
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ILLINOIS STATE BOARD OF EDUCATION 
DEPARTMENT OF SCHOOL IMPROVEMENT SERVICES 

Learning Assessment and School Improvement Plans 

Funded Projects 



The State Board of Education issued request* for proposals to develop and disseminate effective practices in 
the Learning Assessment and School Improvement Plan process. Contact persons and descriptions of the pro 
jects funded for this purpose are included in this list. Persons who want additional information ma> contact 
the projects directly. 



1. 

District: 

Contact: 

Address: 

Phone: 

Title: 

Abstract: 



2. 

District: 

Contact: 

Address: 

Phone: 

Title: 

Abstract: 



3. 

District: 

Contact: 

Address: 

Phone: 

Title: 

Abstract: 



Belleville Public School District 118* 
Ronald B. Riegle, Assistant Superintendent 
105 West "A" street, Belleville, Illinois 62220 
618/233-2830 

Instructional Monitoring Svstem 

The assessment system utilizes locally developed exi* tests for grades 3, 6, and 8 corresponding 
to district-developed skills continuums which represent skills and concepts within learning 
areas and across grade levels. Test results are summarized in computer generated printouts 
for individual students, classes, attendance centers, and the district. Procedures for use of the 
data are also built into the system. 



Brookwood School District 167 

Margot Schlenker, Principal 

201 Glenwood-Dyer Road, Glenwood, Illinois 60425 

312/758-5190 

School Improvement Plan in Language Arts 

This project for evaluating student writing involves a format and process for analyzing, 
compiling and reporting results of state reading and writing assessments and using item anal 
ysis to identify school and student strengths and weaknesses. 



Chicago Public Schools 

Dr. Josue M. Gonzalez, Director, Bureau of Resource Development 
1819 West Pershing Road, Chicago, Illinois 60609 
312/890-8020 

A Proposal to Develop the Achievement Component for Assessment of Student Performance 
in the Fine Arts 

This prcject is designed to provide instructionally relevant information to teachers and ad 
ministrators regarding a caild*s artistic aptitude. An instrument for assessing a child's prog 
ress on (visual) art objectives is to bu developed with these characteristics, teacher adminis 
tered and scored, objective and curricalar based, unbiased and fair, reliable and valid, inex 
pensive and convenient, theoretically sound and rational. 



♦Written reports about these projects are available at the 18 Educational Service Centers. Icfurmation on the other pro 
jects will become available at a later date. 
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4. 

University: 

Contact: 

Address: 

Phone: 

Title: 

Abstract: 



5. 

District: 

Contact: 

Address: 

Phone: 

Title: 

Abstract: 



6. 

District: 

Contact: 

Address: 

Phone: 

Title: 

Abstract: 



7. 

District: 

Contact: 

Address: 

Phone: 

Title: 

Abstract: 



College of Lake County 
David Ross, Counselor 

19351 West Washington Street, Grayslake, Illinois 60030 
312/223-6601, Ext. 352 

A Regional Assessment and Reporting System 

Thl. project develops and implements a regional assessment and reporting system for grades 
8 through 14, v^^ith particular attention on grades 8, 10, and 12 and post high school follow up. 
Information is compiled by school and grade level from student assessments that present a 
profile of academic skills attainment, career interests and educational plans. Various educa* 
tional options are outlined for each student. 



Evanston Tov^^nship High School District 202 
Dr. James E. Phillips, Assistant Superintendent 
1600 Dodge Avenue, Evanston, Illinois 60204 
312/492-3800 

Alignment of Local Objectives with State Goals, Individual Student Improvement Plans, 
Coordination of School Reforms at the Local Level 

This project addresses the alignment of local objectives wi.h State Goals using a matrix as a 
visual framework, prescriptive activities for students whose performance is one grade or 
more below current placement, scheduling techniques and strategies commonly used in busi- 
ness to organize comprehensive school improvement. 



Glen Ellyn Community Consolidated School District 89 
Nan H. Sevy, Assistant Superintendent for Instruction 
789 Sheehan Avenue, Glen Ellyn, Illinois 60137 
312/469-8900 

Development and Dissemination of Learning Assessment and School Improvement 
Plan —Effective Practices 

This system builds a comprehensive calibrated (Rasch scaled) bank of items designed to test 
local learning objectives with criterion-referenced tests. A microcomputer program is used to 
apply the Rasch model to analyze and calibrate test items and a data base management 
system to store and retrieve items. 



Grundy Kendall Educational Service Region 
Rob Molek, Special Projects Consultant 
105 Sage Street, Channahon, Illinois 60410 
815/467-4048 

Alignment of Local Objectives with State Goals, Item Analysis, Comprehensive Assessment 
System 

This project provides a uniform, comprehensive training program on learning objectives/ 
assessment which contains various segments in writing .ojectives and developing 
curriculum, managing curriculum and assessment bj using computer software, reviewing 
and determining computer assessment systems, collecting and disseminating ' items for 
iten} banks. 
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Hamilton Ccunty Community Unit School District 10 

J. Kenneth Hill, Superintendent 

109 North Washington, McLeansboro, Illinois 62859 

618/643-2328 

Computer- Assisted Profile Data Collection and Analysis for Elementary Schools 
The project develops templates for a school district profile using AppleWorks, a software 
package which provides a database appropriate for small elementary attendance centers. 
Three profile areas are included: student demographic data, student grade data, and stan- 
dardized achievement test data. 



Herrin Community Unit District 4* 
Karry J, Revelle, Assistant Superintendent 
700 North 10th Street, Herrin, Illinois 62948 
618/988-8024 

Student Mastery Through Instructional Improvement and Individual Planning 
This assessment system promotes student mastery of learning objectives through instruction 
al improvement and individual educational planning. Special emphasis is on identifying "at 
risk" students and helping them master the basic skills. 



Homewood-Flossmoor High School District 233 

Mrs. Leslie R. Wilson, Project Director 

999 South Kedzie Avenue, Flossmoor, Illinois 60422-2299 

312/799-3000, Ext. 151 

A Prototype for Determining Validity and Reliability on Locally Developed Assessments 
The goal of this project is to identify a process of establishing validity and reliability for local 
ly designed vests of performance and knowledge in any K 12 learning area. A step-by step 
procedure determines reliability and validity of various types of locally developed assess- 
ments. Language arts, fine arts, and physical development skills are the sample domain. 



Illinois State University 

Maurice Scharton, Associate Professor 

104 Lawrence, Normal, Illinois 61761 

309/438-7100 

Writing Assessment 

The purpose of this project is to develop writing assessment materials and workphops for local 
districts in adapting state writing assessment methods to the measurement of local objec 
tives. The workshops provide administrators and teachers, with expe lence and ability to 
design assessm/^nt prompts (assignments), administer an assessment, develop a scoring guide 
from sample essays, and score the samples both analytically and holistically. 
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Joliet Township High School District 204* 

Harold Miller, Assistant Superintendent for Instructional Services 

201 East Jefferson Street, Joliet, Illinois 60432 

815/727-6981 

Development of a Common Testing System for Dual Districts 

The purpose of this project is to determine testing needs and involve teachers, using a "town 
meeting" model, in the selection and development of a broad based program for learning as- 
sessment and school improvement in the high school district and its nine feeder districts. 



Kildeer Countryside C.C.S.D. 96 

Thomas W. Many, Assistant Superintendent for Instruction 
777 Checker Drive, Buffalo Grove, Illinois 60089 
312/459-4260 

Developing a Comprehensive Assessment Program for Dual District School System 
This project involves development of assessment systems in dual district settings with special 
emphasis on item banks linked directly to objectives. The process assists school personnel in 
identifying key instructional goals, developing appropriate learning objectives, writing as- 
sessment itsms, designing valid and reliable item banks, and reviewing student achievement 
through consistent item-analysis procedures. 



Macoa County Regional Office of Education 
Neal Loveall, Assistant Regional Superintendent 
2240 East Geddes Avenue, Decatur, Illinois 62526 
217/424-3403 

Helping Education Through Fine Arts Assessment 

Coded observation tools for use in assessing K 6 student learning in visual art and creative 
dance/movement are developed in this project. The observation toolb are field- tested, and non- 
arts teachers are trained in their use. 



Mt. Prospect Township High School District 214 
Marilynn J. Kulieke, Director of Research and Tasting 
799 West Kensington Road, Mt. Prospect, Illinois 60056 
312/259-5300 

Development of an Assessment Banking and Item Analysis System 

This project develops a computerized assessment banking system which incorporates the ele- 
ments lacking in commercial software. The assessment banking system, although generic to 
all learning areas, is operational in mathematics. 
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Naperville Community School District 2G3 

Micha <j1 J. Harkins, Director of Program Development, Evaluation and Research 

655 Fouth Web ster Street, Naperville, Illinois 60566 

312/420-6558 

Comprehensive Expressive Arts Assessment 

This project is designed to research, develop, test and disseminate a nontraditional Fine Arts 
assessment Several different types of nontr&ditional assessment prucedures are developed to 
match Pine Arts goals and objectives. 



Salem Community High School District 600 
Dale Guthrie, Media Director 
Route 37 North, Salem, Illinois 62881 
618/548-0727 

Cooperative for Effective School Reforms 

A K-12, outcome-based education program involves content reading strategies which promote 
critical thinking skills. The proposed process combines school improvement activities such as 
outcome-based education, critical-thinking strategies and improvement of school report card 
results. 



Schaumburg Community Consolidated District 54 

Dr. William Kritzmire, Superintendent 

524 East Schaumburg Road, Schaumburg, Illinois 60194 

312/885-6700 

Data-Based School Improvement Process 

This eight-step, data based change process is carefully structured for analyzing and applying 
data to make more informed decisions. Data gathered and measured against the district's 
criteria provide the basis for the team to target its improvement effort, plan priorities, and 
identify and implement a step-by-step plan. 



Southwestern District 9 

Donald J. Stuckey, Associate Superintendent 

Post Office, Piasa, Illinois 62079 

618/729-3221 

An Effective Practices in Mathematics and Biological Sciences Learning Assessment and 
School Improvement 

This project integrates the K 6 learning objectives in mathematics and sciences and builds an 
assessment based on those integrated objectives. The process is implemented through an in 
structional materials approach. 
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Sparta Community Unit District 140* 

Jim Macri, Administrative Assistant 

123 West College Street, Sparta, Illinois 62286 

618/443-3822 

Outcome-Based Educational Program 

The district has implemented an outcome based educational program (QBE) which relies 
heavily on the mastery instructional strategies of Bloom and Hunter. The assessment system 
includes a communication network among staff, pilot teaching of units utilizing mastery 
strategies, documentation of student achievement or^ pilot units, and monitoring the impact 
of OBE implementation. 



Springfield School District 186* 

Dr. Robert C. Hill, Director of Instruction 

1900 West Monroe, Springfield, Illinois 62704 

217/525-3026 

Comprehensive Assessment System, Nondiscriminatory Locally Designed Test Instruments, 
Item Analysis; Scoring of Instruments; School Improvement 

This project addresses the development of a comprehensive assessment system drawing on 
the district's language arts assessment program; an analysis of locally developed tests for 
possible disci'iminatory characteristics and correcting the tests, procedures for analyzing 
items on locally developed criterion referenced te.sts using a computerized criterion-referenced test 
scoring system, scoring criterion referenced tests using traditional and non traditional methods, 
use of assessment results to bring about district based or school based school improvement. 



Urbana School District 116* 

Don Holste, Associate Superintendent 

1602 South Anderson Street, P. 0. Box 3039, Urbana, Illinois 61801 

217/384-3636 

District Assessment Plan 

Features of the system include a plan for curriculum development, staff development efforts 
related to that plan, assessment reflecting learner outcomes, multi year student profiles, a 
systematic instructional effort to reduce student test anxiety and a clinical reading/writing 
lab program to assess tenth grade students. 
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Western Community Unit School District 306 

Larry Marsh, Superintendent 

Post Office Box 248, Sheffield, Illinois 61361 

815/454-2593 

A Cooperative Approach which Integrates the Scoring of Assessment Instruments Providing 
Exemplary Reporting Procedures and Validity and Reliability of Locally Designed Testing 
Instruments 

The project generates a generic set of materials for use by similar districts in their LAP/SIP 
process. Trained consultants deliver the process within the Learning Assessment Cooperative 
and beyond as requested. Phase I focuses on technology applications to the development and 
alignment of assessment instruments, phase II on reporting, materials development, and 
consultants. 



Woodstock Community Unit District 200* 

Thomas W. Reimer, Assistant Principal 

501 West South Street, Woodstock, Illinois 60098 

815/338-8200 

Student Assessment System 

The assessment system is a cost effective method for identifying concretely stated expecta 
tions and determining if those expectations are being Kiet. Expectancies were developed and 
defined as outcomes to be mastered by the average student by the completion of a grade level. 
Teachers used the lists of expectancies in developing assessment instruments. 



Zion Elementary District 6* 
Eugene H. Latz, Superintendent 
1716 27th Street, Zion, Illinois 60099 
312/872-5455 

District Assessment System 

The district assessment system demonstrates congrue.ice between the written, taught, and 
tested curriculum, it identifies the goals for learning but also specifies the emphasis given to 
those goals. The system is an instructional tool to evaluate progress and provide diagnostic in- 
formation, identify instructional resources for remediation, evaluate instructional programs 
and provide information for curricular planning, and provide accountability to the communi- 
ty, school board and state. 



Zion-Ben ton Township High School District 126* 

David H. Cox, Superintendent 

1606 West 23rd Street, Zion, Illinois 60099 

312/746-1202 

The School Performance Concepts and Information Plan 

To go beyond student test data in monitoring the educational processes, the district collects 
and analyzes five categories of information: external testing (e.g., achievement tests, SAT 
and ACT results, and other test lesults); local departmental assessments (final examination 
results and a grade distribution analysis), awards and interscholastic competition perfor- 
mance; behavior and discipline (attendance, expulsions, suspensions and dropouts), and con- 
stituent satisfaction (opinions of parents, teachers, students and graduates on the success of 
the district). 
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Projects Referenced by Topic 

Alignment of Local Objectives 
5, 7,20 

Assessment Systems 

1, 6, 7, 9, 12, 13, 15, 20, 21, 22, 24, 25, 26 

Coordination of School Reforms at the Local Level 
5 

Fine Arts 

3, 10, 14, 16 

Individual Student Improvement Plans 
4, 5, 9, 24 

Item Analysis/Item Banks 
6, 7, 13, 15, 21 

Language Arts 
2, 10, 21 

Mathematics 
15, 19 

Nondiscriminatory Locally Designed Testing Instruments 
21 

Physical Development and Health 
10 

Reading 
17,22 

Reliability 

(See Validity) 

Reporting System 
2,4,8,23 

School Improvement Based on Assessment Results 
17, 18, 21 

Science 
19 

Scoring Assessment Instruments 
21,23 

Validity and Reliability 
•0,13, 23 

Writing Assessment 
2,11,21,22 
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